The race to save the first draft of coronavirus history from internet oblivion

Within a week, Blair’s tweet got the attention of 200 participants worldwide, along with academics who either wanted to help or wanted advice on how to conduct their own diary studies during quarantine.

Projects like Blair’s and Jaouad’s will preserve portions of our lived experience during the pandemic. But getting the internet to archive as much as possible about this moment is a monumental, ongoing task.

Mark Graham is the director of the Wayback Machine at the Internet Archive, a group that is now part of the race to archive important content related to the covid-19 pandemic. The novel coronavirus collection project, launched on February 13 with the International Internet Preservation Consortium, is collecting and archiving pages and resources connected to the pandemic.

“Archiving has never been about saving everything. It’s about trying to save a representation,” says Graham.

According to Brewster Kahle, the Internet Archive’s founder, his organization is already collecting about 1 billion URLs a day across the web. Archiving the pandemic means trying to identify and collect the pages their ordinary efforts might otherwise overlook, relying on a network of library professionals and members of the public: local and international public health pages, petitions, resources for medical professionals trying to fight covid-19, and accounts from those who have had the virus. It’s not easy. “The average life of a web page is only 100 days before it’s changed or deleted,” he says.

Archives have shaped how we understand our past. During the Great Depression in the 1930s, there was a massive effort to document aspects of American life: the Farm Security Administration sent photographers across the country on assignment to document specific topics and ideas. The resulting work, 175,000 photographic negatives, is a valuable pictorial record of life during the Depression. But the internet is on a much bigger scale, and all those who post are potentially their own documentarian and curator. Capturing the covid pandemic online isn’t just about saving a URL; it’s about saving the right URLs over and over again, to show how things have changed over time.

“You don’t know quite what’s going to be useful until you’ve not done it, and then you have the head-slap moment,” says Kahle. And so it becomes vital to just do as much collecting as you can. Let history tell us what was important and what was not.

That’s how the Library of Congress’s web archiving team is approaching this moment. “We don’t really have a collection defined yet for this. We’re kind of seeing how this evolves,” says Abbie Grotke, the team’s lead. “We’re going to make sense of it in a few months when we have time to breathe.”

The Library of Congress and the Internet Archive both know they’re going to miss broad parts of the covid pandemic playing out online. The LOC has to seek permission from site owners to collect and provide public access to an archived version of a domain, and the Internet Archive is up against a web that might shift more quickly than it is able to capture.

April 21, 2020