mcswell
9h

Just to be clear, the Pravda that this article talks about is the Russian one. There is a Ukrainian Pravda (https://www.pravda.com.ua), which is presumably more accurate.

Dakara
9h

Yes, this is the new battlefield. Previously, there was a lot of talk about jailbreaking models, but seeding the internet with trojan data is the next frontier.

We see it already happening with research papers, I mentioned recently here.

https://www.mindprison.cc/p/the-ai-hacking-wars-begin-trojan-data

And there is no solution, as you can't ensure reliable behavior of LLMs. There is no such thing as "AI Safety" when the attack surface is essentially the entirety of human language.

