- The Wayback Machine is under threat from AI once more
- The AI boom has tripled the price of the large hard disks needed for this expansive archive of the web
- This is a further danger posed to the Wayback Machine, which is also in trouble due to news sites blocking its web crawler, which is again due to AI
It’s an increasingly desperate time for those trying to keep a record of the history of the web, as AI is again proving a serious stumbling block to the efforts made by the likes of the Internet Archive — and this time it’s about soaring hard drive prices.
You may recall that last month, we covered another angle on the difficulties AI has been causing the Internet Archive’s Wayback Machine. This is the non-profit organization’s history of the web, and there’s a problem in that, as part of measures designed to foil AI scraping their content, online news sites are increasingly blocking the web crawler the Internet Archive uses to compile the snapshots of web pages that comprise the archive.
And now, 404 Media reports (via Tom’s Hardware) that the Internet Archive is suffering due to the hard drive shortage brought on by AI (as more large drives are needed in data centers for AI workloads).
Yes, the AI boom is not just about LLMs (Large Language Models) eating your RAM and SSDs, but also hard drives (as well as indirect effects on other components).
The huge hard disks — on the order of 30TB — that the Internet Archive needs to host the Wayback Machine’s historical record are now up to three times more expensive, or indeed completely out of stock. In this way, the AI boom is now a “very real issue costing us time and money,” the Internet Archive’s founder Brewster Kahle commented to 404 Media.
With some 210 petabytes (210,000TB) of web page snapshots in its library, which is expanding by 100TB daily, you can appreciate the scope of the web archiving that’s going on here.
Wikipedia’s parent non-profit, the Wikimedia Foundation, is reportedly facing similar struggles, as you’d imagine. It has some 65 million articles to host, which takes up a lot of drive space. A Wikimedia Foundation spokesperson told 404 Media that the main problems are the “purchase of memory and hard drives”, but also lead times on server deliveries.
Analysis: workarounds aplenty — but what about tape?
So, is the Wayback Machine really in danger? Are we going to see the wheels start to come off the ‘living history of the internet’? Well, there’s no immediate peril, as apparently donors and the community around the Wayback Machine are pulling together to work around the issue of spiralling drive costs.
Still, this is clearly a concern going forward — and the blocking of the Internet Archive’s web crawler is even more so. The problem there is that the news sites are blocking AI scraping, but those blocks can be circumvented if the owner of the AI targets the content via the Wayback Machine instead. It’s a thorny issue, but talks are ongoing, and hopefully both sides can come to some kind of resolution.
And on the drive front, if you’re wondering why the Internet Archive can’t switch to tape as a storage medium, the catch there is that it’s a ‘living’ archive of the web — as in it’s online, for people to access those web page snapshots on demand. As such, hard drives are needed for that access to be responsive. Tape simply isn’t up to snuff performance-wise in this case.
The Internet Archive does use tape, mind, for longer-term backups of content, but it’s only part of the puzzle in that respect. Hard drives are vital for the actual day-to-day functioning of the Wayback Machine as we know it, in terms of being able to quickly serve users the content they need online.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.
