33 Comments
User's avatar
Dakara's avatar

Yes, this is the new battlefield. Previously, there was a lot of talk about jailbreaking models, but seeding the internet with trojan data is the next frontier.

We see it already happening with research papers, I mentioned recently here.

https://www.mindprison.cc/p/the-ai-hacking-wars-begin-trojan-data

And there is no solution, as you can't ensure reliable behavior of LLMs. There is no such thing as "AI Safety" when the attack surface is essentially the entirety of human language.

Expand full comment
mcswell's avatar

Just to be clear, the Pravda that this article talks about is the Russian one. There is a Ukrainian Pravda (https://www.pravda.com.ua), which is presumably more accurate.

Expand full comment
Gerben Wierda's avatar

“Even with that knowledge, it nevertheless often repeats propaganda from Pravda.” — that is of course because the models are *text* models, not *language* models, let alone *knowledge* models.

Expand full comment
James Rice's avatar

And some of the bad actors are the AI executives themselves, as in Grok 4 looking to see what Elon thinks before answering.

Expand full comment
Tim Nguyen's avatar

I was wondering when Grok 4 and Elon would inevitably would appear in this conversation. I thought about mentioning them myself, but I personally didn't want to give Elon more attention than he already has.

Expand full comment
Larry Jewett's avatar

Incidentally, speaking of Grok and Elon making an appearance.

Have we ever seen Elon and Grok in the same room together?

Expand full comment
Roberto Argentina's avatar

Thank you so much for this work.

Expand full comment
Andy's avatar

Humans (the 'gold standard of cognition') are spectacularly bad at this exact task. Millions of people read and share articles from known propaganda outlets every single day. The entire field of media literacy exists because the average person doesn’t naturally make the reasoning leap:

(A) This source is biased + (B) This article comes from that source ⇒ (C) I should be highly skeptical of this content.

Humans also struggle to distinguish satire from real news - as shown by the frequent sharing of Onion articles as fact. Conspiracy theories thrive precisely because people fail to evaluate sources and apply consistent reasoning.

So, while Gary frames the LLM’s failure as a uniquely artificial and dangerous flaw, it’s actually one of the most pervasive and dangerous flaws in human cognition.

Expand full comment
xine's avatar

I wonder if models are particularly prone to being tripped up by Pravda because pravda means "truth" in Russian.

Expand full comment
Larry Jewett's avatar

“Truth” (fake or real) does not enter into the equation of LLM relativity

Output = LLMc^2

Expand full comment
Sanjay Mehta's avatar

Pravda is propaganda and CNN/NYT/WSJ/BBC are not?

Okay then.

Expand full comment
Digitaurus's avatar

That sounds about right. The BBC tries hard to get its reporting as accurate as possible. It doesn't always succeed, and gets hauled over the coals in public spaces when it does, but it tries.

Expand full comment
Oaktown's avatar

And therein lies the difference: When newspapers or journalists print mistakes, they're held accountable and subject to lawsuits and public scrutiny.

Expand full comment
Sanjay Mehta's avatar

Not true. The BBC always puts a negative spin on stories coming out of countries which don’t toe the Anglo-Saxon line anymore. Not as blatant as the US rags, but very untrustworthy.

Expand full comment
Larry Jewett's avatar

The Brits are still sore about losing India and the rest of their empire.

BBC negative spin is just their pathetic way of getting back.

Expand full comment
Digitaurus's avatar

Interesting idea. I agree that the BBC has a particular world viewpoint, if that's what you mean. I think the BBC has probably reduced its local journalist coverage over the years, which is going to lead it to make more mistakes, but I believe the organisation remains committed to giving a level-headed analysis of the world's events. Can you give me an example that illustrates your point?

Expand full comment
Digitaurus's avatar

This really doesn’t help your case but I understand better where you are coming from. Thank you.

Expand full comment
P Szymkowiak's avatar

I'm reminded of my first big exposure to LLM grooming: Kevin Roose's sensationalised 2023 "Bing’s A.I. Chat: ‘I Want to Be Alive. 😈’" going viral.

To clarify - you can think of this as LLM *Prompt* grooming as distinct from LLM *Data* grooming (as discussed in this current post). LLM Prompt grooming is a problem in its own right that might be caused either intentionally or unintentionally (the latter through our inherent biases, much as Roose's example).

While the main takeaway that lead to the viral exposure of Roose's experience was "shock and concern" at the A.I. responses, *my* main reaction was to be "creeped out" by Roose's apparent proficiency with the use of LLM-prompt-grooming techniques.

Reading the article and conversation transcript, I was most shocked by Roose's use of conversation patterns that reeked of the grooming attacks used online by adults against naive, unsuspecting minors.

As a mode of calculated attack against an LLM / RNN, LLM-prompt-grooming makes a lot of sense: what I couldn't fathom at the time was Roose's performative shock and concern, when to me the LLM engine was simply providing reasonable / anticipatable responses to well-established patterns of conversational grooming.

Expand full comment
Don's avatar

I saw something like this effect for historical material, too. ChatGPT didn't mention the trials or war crimes for 5 of 14 defendants in one of the Nuremberg Trials https://blog.zgp.org/llms-and-reputation-management/

Expand full comment
AKcidentalwriter's avatar

This is not surprising! We have synthetic data. It all was inevitable. I remember my high school electronic engineering teacher. He said to me 35 years ago G.I.G. O = garbage in garbage out. Not surprising to me.

Expand full comment
Larry Jewett's avatar

AIIGO AI In Garbage Out

Expand full comment
Aaron Turner's avatar

I now have an image of LLMs eating their own crap which I can't get out of my head...

Expand full comment
Stan's avatar

LLMs are no better than humans when it comes other evaluating online sources, they have no concept of what websites produce disinformation. What you are proposing is that LLMs should be censored by blacklisting certain information, which 100% doable regarding online search. You can also train model to be more aligned with U.S. national security.

Online search is definitely a problem for the industrial censorship complex. On one hand, it can cite and share propaganda from the national state, on the other hand, it can do the same for propaganda by U.S. adversaries.

The solution is easy, compel U.S. AI companies to integrate with the censorship complex so that functionalities like deep search is constantly updated with information it should censor. Consider also aligning AI models with U.S. national security. That's how you win.

Expand full comment
Digitaurus's avatar

The challenge is that a current-generation base model, like 4o, when asked to provide a response without undertaking a web search, will frequently (but unpredictably) hallucinate the answer for any question that has been rarely discussed in its data set or where the answer is going to have changed over time.

For example, when asked shortly after the Air India crash whether the Ram Air Turbine on a Boeing 787-8 could be manually deployed by the crew, ChatGPT4o stated "confidently" in response to my prompt that it could not, when it replied without a web search; it replied that it could, when responding with a web search included. The latter is the correct response, in this case.

In short, web-search supported outputs (RAG) are really always required to avoid unpredictably made-up answers. But, as you point out, that makes them vulnerable to ingesting low-quality sources and treating them as accurate.

As you point out, little effort seems to have been made, so far, by OpenAI to curate the quality of the sources delivering the web search results. This is arguably down to the quality of the underlying search engine (Bing?), as humans are presumably also subject to undue influence from unreliable sources. Either way, "critical thinking" will be hard to install in these machines if we are to rely on their opinions, but necessary.

Expand full comment
Matt A's avatar

Of course LL’s can be groomed. Truth claims have a moral dimension, one which an LLM cannot “know”. A LLM cannot evaluate a truth claim. How could it “know” that the earth is round? It cannot observe the real world or evaluate that claim in any sense. On what basis would it assign credibility to the claim the earth is round over the flat earther’s claim that it is not? The reason live testimony by witnesses is required in courts (at least in the U.S) so that the judge or jury can evaluate the witness’s credibility. Something an LLM cannot do.

Expand full comment