wallerichmercie / New Africa / Shutterstock””>
A scale with some books before AI and after AI, with the scale tipping more towards the before AI side.
-
Generative AI is flooding the internet with content that can be convincingly wrong.
-
Online sources could become more AI-generated, risking the spread of misinformation.
-
AI-generated books, including on critical subjects like mushroom foraging, can be dangerous.
Generative AI, whether you love it or hate it, is here to stay. Which means that lots of content is being generated by these software systems and flooding the internet. The web has never been the most concrete source of information, but if things keep going the way they are, printed paper sources of information from before the AI flood may become a priceless resource.
AI Makes Up Way Too Much Stuff
Why is generative AI a threat to information? The short answer is that generative AI “hallucinates.” This is when the software comes up with a totally plausible-sounding, confident answer, that’s 1000% wrong. AI systems like ChatGPT can spit out thousands and thousands of words every minute, and unless that output is quality-controlled by an expert, some percentage of it will be convincing nonsense.
Of course, human beings are just as good at putting convincing nonsense on the internet, but there’s a limit to how much text humans can generate. I’m a professional online writer, and at best I can knock out 5000 words a day. An LLM could do ten times that without breaking a virtual sweat.
The Internet Is Being Flooded by AI Content
People have taken to generative AI technology like a wildfire to a dry field. Originality AI is running an ongoing study to estimate how many of the results in a Google Search are AI-generated, and so far this number has peaked at just under 14%. I have little doubt that over the next few years the percentage of what you read on the net will become more and more AI-generated without a human expert reviewing the text. I also think that current estimates are probably lower than reality, since AI text detection simply doesn’t work.
What makes this even worse, is that even if you have an expert human to edit, proof, and review what comes out of an LLM, those experts have to rely on sources. Looking at my own job, I have to use online sources because I can’t know or remember everything. If those sources are tainted by nonsense, it means running the risk of unknowingly spreading misinformation.
AI-Generated Books Are a Scourge
Sydney Louw Butler / How-To Geek / Shutterstock.com
Writing is a tough and time-consuming job. Something that takes you a few minutes or hours to read, takes hours, days, months, or years for a human being to write. With the advent of generative AI however, virtually anyone can pump out writing at literally inhuman rates. So it should be no surprise that AI-generated books are flooding online stores like Amazon. It’s not just annoying, it can be legitimately dangerous. For example, AI-generated books on mushroom foraging can lead to people (allegedly and by no means confirmed) eating poison mushrooms. Even scientific journals aren’t immune.
Pre-ChatGPT Printed Material Could Be Worth Its Weight in Gold
Which leads me to a surprising conclusion—pre-AI books are potentially worth their weight in gold. These are permanent, physical records of human knowledge before it could have been tainted by AI text generators. A gold standard against which to measure the claims of LLMs and human authors alike. In particular, any writing on scientific, medical, engineering, or other similar subjects would be critically important. Perhaps just as important are historical records. Can you imagine a future where we believe a historical timeline invented by sloppy AI systems?
Online Preservation Matters Just as Much
While paper might be a safeguard of our most precious information, we can do the same with electronic copies of pre-AI media. It’s simple to create a hash of a “master” copy of these documents to ensure that not a single word or letter has been changed. So that whatever errors there may be in these texts are human ones, and we can know the providence and editorial process for each of these texts.
With the rise of AI-generated images and video, the same can be said for them. Physical discs may become critical to preserve, and the same digital safekeeping could be applied.
I’m the last person to be against the potential benefits various AI technologies can offer us. At the same time, I’d hate for the sum of human knowledge we’ve spent thousands of years creating to be corrupted by half-baked AI in the blink of an eye. So hold on to that encyclopedia set!