Uncategorized —

Internet Archive settles suit over Wayback Machine

There are times when we would all like to turn the clock back a few months (or …

The Wayback Machine is cool, not just because you can go back and see what Ars looked like in May 1999, but because its 1+ petabyte archive contains a treasure trove of data for researchers. Last year, the Internet Archive, which runs the Wayback Machine, was sued by Healthcare Advocates after the attorneys for another company used the Wayback Machine to access information that might be helpful in an ongoing legal action.

Healthcare Advocates and the Internet Archive have finally resolved their differences, reaching an undisclosed out-of-court settlement. In some ways, that's disappointing news for onlookers who were hoping to see how a court would sift through the complex issues facing Internet archives, caching systems, and more. More on that later.

Here's the backstory. Healthcare Advocates found itself embroiled in a trademark dispute with Philadelphia-based Health Advocate. The latter company was represented by Harding Earley Follmer & Frailey, which used the Wayback Machine to access Healthcare Advocates web pages dating back to 1999 in an attempt to find information that would bolster their client's case. Healthcare Advocates then sued both Harding Earley and the Internet Archive, alleging among other things, violations of the DMCA.

Operated by the Internet Archive, the Wayback Machine dates back to 1996 and archives web sites using Alexa's crawler. Like many other crawlers, Alexa respects the Robot Exclusion Standards (RES), a voluntary protocol designed to prevent robot crawlers from accessing part of a website. In this case, the lawsuit between the two Advocate companies was filed on June 26, 2003. On July 8, 2003, Healthcare Advocates added a robots.txt file to its site to invoke the RES so that crawlers would stop spidering it.

The Internet Archive voluntarily observes the RES, and in fact, will retroactively make archived data off-limits once a such a robots.txt file is encountered. So in the case of Healthcare Advocates, the Wayback Machine should have made everything from the site archived prior to July 8, 2003 unavailable. Despite that, Harding Earley successfully accessed older data around 110 times on almost 850 attempts.

Those attempts led to accusations that the law firm violated the DMCA and that the Internet Archive was negligent in allowing the data to remain available to searchers. The archived "snapshots" have since been pulled off of the Wayback Machine and the Archive is in the clear, but the lawsuit against Harding Earley will go forward.

In a way, it's unfortunate that the suit didn't go forward. At the heart of the dispute was Healthcare Advocates' attempt to turn back the clock to a time before data that proved harmful to the company's interests was posted. That's the funny thing about the Internet: once something is posted there, anyone can read and do just about anything else they want to with the data. Even safeguards like RES are completely voluntary; there is nothing to stop an interested party from completely archiving web sites.

Legal threats can help move some data out of reach, but once data has been published on the Internet, the damage has been done and you can't go back in time. The only thing the courts can do is determine who is liable and how much, and that's a decision this settlement postpones.

Channel Ars Technica