The information grey goo

I’m broadly positive about the future of LLMs and AI, but no one should pretend there will not be difficulties or that the transition to using machines isn’t going to pose plenty of challenges. 

Some scenarios, though, are profoundly dangerous, not just for the publishing and creative industries, but for society as a whole. 

When we discuss the threat of AI, many people imagine rampant machine intelligences with big guns hunting us all down in a post-apocalyptic wasteland (thank you, James Cameron). I doubt that’s likely. But one consequence which I can see use sleepwalking into is the informational equivalent of an apocalypse that dates back over thirty years: the “grey goo” scenario.

“Grey goo” was a concept which emerged when nanotechnology was the hot new thing. First put forward by Eric Drexler in his 1986 book The Engines of Creation, this is the idea that self-replicating nanobots could go out of control and consume all the resources on Earth, turning everything into a grey mass of nanomachines. 

Few people worry about a nanotech apocalypse now, but arguably we should be worried about AI having a very similar effect on the internet. 

Nowhere is safe

Unless you haven’t been paying attention, you will have noticed that the amount of content created by LLMs has been increasing at a vast rate. No one knows how much content is being generated, but SEOs – whose job it is to understand content on the internet – are concerned. Less ethical SEOs have used a combination of scraping and generative AI to quickly create low-quality sites with tens of thousands of pages on them, reaping rewards in traffic from Google over the short term. 

The problem for Google is that creating a site like that is the work of perhaps a week – and probably a lot less if it can be automated – while it takes months for the search engine to spot that it’s a low-quality site. With more automated approaches, it will become trivial to create spammy sites far faster than Google can combat them. It’s like a game of whack-a-mole, where there are moles appearing at an exponential rate. 

And Google isn’t the only platform which AI is threatening to turn to mush. Amazon has a issue with fake reviews generated by AI. And although it claims it is working on solutions, it appears to be incapable of even spotting fake AI-generated product names.

But what about human-to-human social networks? They have already been flooded with AI-generated responses. And it will only get worse, as companies create tools which let brands automatically respond to posts based on keywords using AI-generated text. Sooner or later, saying something which suggests you are in the market for a new car will get you spammed by responses from Ford, Skoda, VW, Tesla, every car dealer in your area, every private second hand seller… you get the picture. Good luck trying to find the real people. 

It is obvious that anywhere content can be created will ultimately be flooded with AI-generated words and pictures. And the pace of this could accelerate over the coming years, as the tools to use LLMs programmatically become more complex. 

For example, think about reviews on Amazon. It will be possible to create a programme which says “Find all my products on Amazon. Where the product rating drops below 5, add unique AI-generated reviews until the rating reaches 5 again. Continue monitoring this and adding reviews.” 

We are already at the point where you can use natural language to create specialist GPTs. The ability to create these kinds of programmes is ultimately going to in the hands of everyone. And this applies to every rating system, all surveys, all polls, all user reviews – and similar approaches can be created for any kind of content. 

Can Google, Amazon and the rest fight back? Yes – but at great cost. And it’s not clear that even the likes of Google has the resources to effectively fight millions of users of AI creating billions of low-quality pages at an accelerating scale.

Model collapse

A side-by-side comparison of content created from the same prompt in ChatGPT 3 versus ChatGPT 4 Turbo will show you the difference. And humans are getting better at writing prompts and giving AI models the information they need to do a better job. So surely, this is just a short-term problem, and AI content will get “good enough” to not flood the internet with crap.

The issue is that there is a counterbalancing force at play. As more and more AI-generated content floods the public internet, more and more of that content will end up as training data for AI. Exacerbating this, quality publications are largely blocking AI bots, for entirely understandable reasons, which means less, and less higher-quality content is being used to train the next generation of models.

For example, researchers have noted that the  LAION-5B dataset, used to train Stable Diffusion and many other models, already contains synthetic images created by earlier AI models. This is the equivalent of a child learning to draw solely by copying the images made by younger children – not a scenario which is likely to improve quality.

In fact, researchers already have a name for the inevitable bad outcome: “model collapse”. In this case, the content generated by AI’s stops improving, and starts to get worse. 

The Information Grey Goo

This is the AI Grey Goo scenario: an internet choked with low-quality content, which never improves, where it is almost impossible to locate public reliable sources for information because the tools we have been able to rely on in the past – Google, social media – can never keep up with the scale of new content being created. Where the volume of content created overwhelms human or algorithmic abilities to sift through it quickly and find high-quality stuff.

The social and political consequences of this are huge. We have grown so used to information abundance, the greatest gift of the internet, that having that disrupted would be a major upheaval for the whole of society.

It would be a challenge for civic participation and democracy for citizens and activists, who would no longer be able to access online information, opinions, debates, or campaigns about social and political issues. 

With reliable information locked behind paywalls, anyone unwilling or unable to pay will be faced with picking through a rubbish heap of disinformation, scams, and low-quality nonsense. 

In 2022, talking about the retreat behind paywalls, Jeff Jarvis asked “when disinformation is free, how can we restrict quality information to the privileged who choose to afford it?” If the AI-driven information grey goo scenario comes to pass, things would be much, much worse.

Ian Betteridge @ianbetteridge