Archiving the Intenet the German Way…

This past week the German Bundestag (parliament) published a law (it was passed in 2006, but doesn’t go into effect until it is published in its final form) that mandated all German websites needed to deliver a copy of all digital content (text, photos, sound, and any other multimedia content) to the National Library in Leipzig, which is the German equivalent to the U.S. Library of Congress. German companies protested the law throughout the legislative process, arguing that it set an undue burden on companies to comply and would result in enormous financial costs. Not only is the law itself interesting – mandating the state to archive the internet – but also how it is going to preserve the internet content. The library has asked that all of the website content be submitted in one of two formats – either as a PDF file or if the content stretches over multiple pages, such as a multi-page HTML website, then the content should be submitted using the ZIP compression format containing all of the related files. This last bit alone raises so many questions – such as which files need to be included and how often do companies need to resubmit their content – every time the website is updated?

One exception to the law is content that is generated by private citizens for private use. But, this raises a whole other set of questions, such as what is “private” on the Internet, a space that is by design “public.” Pundits have been quick to point to the gray area of Weblogs – are they private or are they public. If companies are supposed to archive all of the content, then what happens to “private” blogs that are hosted by for-profit companies like Blogger or even Facebook? Theoretically, companies that don’t comply will be served a letter of warning followed by a fine of up to 10,000 Euros for each act of non-compliance. At the moment, the National Library has issued a statement that it is not going to enforce the statute until it has been able to fully assess its ability to store all of the data that will be flowing its way.

In a related story out of Europe this week – the European Union has decided to take on the Google Book Project by digitizing the contents of Europe’s largest libraries, museums, archives, and film studios – placing this content online. The first incarnation of “Europeana” should be launching November 20th. The European Commission, which is coordinating, but not actually carrying out the project, hopes that the new website will become a clearing house for access to European civilization. Those are lofty goals indeed and to compete with Google might be an even loftier one, but it will be a wonderful (free) resource for those of us living and teaching outside of Europe. At the same time, as per the EU’s goals, the digitization of the cultural objects will also serve to preserve them in a digital age. For more on this topic, see this English Language article from Der Spiegel.

Both of these two recent examples highlight several of the themes that were addressed in this week’s readings. As all three of the readings allude, finding a digital medium that can hold its own over time is probably going to be the greatest challenge for digital archivists. I see a great hurdle being created here by the German National Library – submitting one’s site via PDF or a ZIP file is probably only a temporary fix and does not actually ensure any sort of preservation. With the PDF format, the library is basically asking the owners of the sites to “print” out their site and submit this copy to the library in what is at the moment the ubiquitous e-book or e-paper format. However, will it remain so in the near and distant future? Already there are competing formats, most of which actually rely on less sophisticated coding – using ASCI text and a style sheet instead of embedded formatting. The ZIP file format seems even more controversial – who is going to guarantee that what is submitted can actually be accessed? So much of the rich multimedia content on the web is dependent on specific server-side technologies that would make a stand-alone version relatively useless. Instead, maybe the National Library should consult with the people at the Internet Archive about what might be better ways to archive the content that the library desires…

The Europeana project sounds more feasible, as it aims not to digitize everything out there, but gather together the various digitization projects in Europe and place them under one roof for easy access and cross-referencing. Europeana is in effect attempting to build a multimedia encyclopedia out of the content that has been or is being created in Europe. One of the criticisms raised in the Rosenzweig article was that archivists complain that they cannot archive everything and that someone (i.e. historians) need to help in the process of determining what should be preserved and what can be discarded. In some ways, Europeana is performing exactly this function (at least partially) – it is selecting those aspects of European culture that have been deemed the most important for inclusion, which in turn will guide other archivists and curators to gather more content in order to further enhance the collection as it grows over time.

The immense job of a digital archivist is far from enviable, especially when important digital historical documents have been willfully or even purposefully deleted. One of my favorite Bloggers is Dan Froomkin of the Washington Post. He writes a Blog called White House Watch, which analyzes not just the White House but also the White House Press Corps. Starting in April 2007, Froomkin wrote a series of posts concerned with the deletion of White House emails and how they could impact future historical accounts of the Bush White House. The first article in the series is here and is worth a read.

Copyright and Copyleft: Balancing Fair Use and Creative Rights

This week’s assignment for the course had us look more closely at the issue of copyright as it pertains to digital history. The issue of “property” as raised in the chapter of Lessig’s book Free Culture offers an interesting historical perspective of how the concept of property has evolved in the United States and how “cultural property” has now taken on similar meaning to “personal property” – two concepts that were initially dealt with by the framers of the US Constitution in very different ways.

Cohen and Rosenzweig apply Lessig’s teachings to the realm of digital history and point out several interesting and important test cases that have shaped the way that digital historians can use historical material under the concept of “fair use” and yet also alert the reader that this is a very “grey” area of law that is in constant flux.

Thus, the question remains – how is a historian supposed to operate within the new context of digital scholarship, yet still respect the lawful rights of copyright holders. This question needs to be an important element in the initial stages of planning a digital history project. If, for example, you are thinking of creating a database of primary sources, then you need to make sure that all of your sources are either in the public domain or within the realm of fair use. For anything that falls outside of these two categories you will need to seek out the copyright holders and negotiate the terms. Good planning in your grant writing will help cover for at least some rights purchases and should be included in any preliminary budget that you create. Of course, not all digital history projects are necessarily driven by primary sources. Your site might be an educational or interpretive site, meaning that most, or even all, of the content is original work. In this respect, you now hold the copyright to the teaching materials.

The less practical and more theoretical issue raised by this week’s readings is the idea that the extension of copyright rights to life plus seventy years has a negative impact on the creativity of society and the growth of scholarship in general. This raises an interesting (if daunting) perspective that we only can be creative and build on the work of others if they are freely accessible. I’m not sure I completely agree – as historians we are used to having to scour through archives and libraries to find the sources that we need. What it appears is that advocates of open access want less to gain access to new areas of knowlege, but rather to take advantage of the digital age to access this knowledge more quickly and in a digital format. We can all see the advantages of accessing information digitally, but does the lack of such access really restrict our creative abilities? This line of argument might have more weight when dealing with images, sounds, and film, where current copyright laws restrict even the concept of quoting such information in non-textual formats. Here, I would agree with the authors that the current copyright asserts too much control over the use of these formats.

Overall, the issue of copyright is a very complicated one. As Cohen and Rosenzweig rightly note, however, digital historians should be aware of these issues, but not focus on it too much that it stunts one’s own creativity to explore the possibilities of digital history.

Born Digital

This week I asked you to take a look at a few different websites that function as virtual archives for documents and information that have been created digitally – there is no paper back up for this information. All three of these digital archives are very interesting in that they were set up almost immediately after the event in an effort to collect and document the events that were happening in real time.Yet, the archives have remained open and people continue to contribute to them.Part of each of these projects involved uploading “real” documents and pictures, but other aspects of these sites were designed to capture people’s feelings and memories.

I think there is also an interesting parallel here with oral history, but still different. By opening up a forum for people to post their “raw” memories, we see (and preserve) what they felt were personally the most important. There is no historian here to prompt or flush out different aspects of that memory. Nor are there any means to filter or fact check those memories. As someone who primarily works on collective instead of personal memories, I have a hard time processing these individual impressions and extrapolating a larger meaning.

The Flickr Commons is not necessarily an “active” digital archive like the other three examples, but both the larger Flickr site and the Commons component have created a fascinating depository for digital photographs. The commons area in particular is very interesting. The first large donor to the commons was the Library of Congress, who submitted thousands of photographs so that users could comment on them and help the curators at the library identify who was pictured and where these photographs were taken. This raises the issue of “shared authority” that all public historians need to deal with. However, here we see a great example of where the public might have a body of knowledge that curators lack and can provide better, more accurate accounts and descriptions than the professional. I really like how people have marked up the various photos and commented with their own experiences (or the memories that they have of parents or grandparents talking about an event). This sort of community building and social networking are very interesting aspects of these “born digital” archives – the objects on display might have once been analog, but now that they are digital they are creating new digital addendums or annotations.

Let me also raise a few other questions for the class (some that also stem from my own reading of Dan Cohen’s article on the “Future of Preserving the Past.” First, how do we control for authenticity with these “active archives”? Of course preserving the thoughts and memories of the “common man” is important, how do we authenticate (or do we need to) that memory as real or constructed? Secondly, by relying on digital contributions to these digital archives, what do we do with all of the potential analog media that might be ignored only because it is analog (of course things can be converted…)? I also wonder about the opt-in nature of such archives. What if someone didn’t want to contribute their material to a digital archive, but their material was of great importance to the project at hand? Are we creating an artificial bias toward digital contributions and artifacts? Of course, we have this difficulty with normal archives as well – some are public, others are private. There are laws to protect privacy and items can be closed for 30+ years in most western cultures without filing court papers to force their release.

On a more positive note – these new digital archives greatly increase the repository of information for future scholars to use. They allow for searchability that current archives cannot even try to mimic. Finally, the biggest issue with archives that are born digital is the issue of preservation, but we will address this in two weeks, so I don’t want to get into that now even though it is a related issue.