Discussion about this post

User's avatar
Dakara's avatar

Yes, this is the new battlefield. Previously, there was a lot of talk about jailbreaking models, but seeding the internet with trojan data is the next frontier.

We see it already happening with research papers, I mentioned recently here.

https://www.mindprison.cc/p/the-ai-hacking-wars-begin-trojan-data

And there is no solution, as you can't ensure reliable behavior of LLMs. There is no such thing as "AI Safety" when the attack surface is essentially the entirety of human language.

Expand full comment
mcswell's avatar

Just to be clear, the Pravda that this article talks about is the Russian one. There is a Ukrainian Pravda (https://www.pravda.com.ua), which is presumably more accurate.

Expand full comment
34 more comments...

No posts