Behind the Scenes: Describing Archived Websites

On May 22, I participated in an Archive-It training webinar on describing archived websites. The following is a summary of my short presentation on the Wilson Special Collection Library’s approach to describing archived websites in finding aids.

Special Collections has been archiving websites with Archive-It since 2013. Our Archive-It account is spilt into collections that reflect our five main collecting units as well as one collection for the UNC at Chapel Hill Art Library. Some of our collecting units use catalog records to describe archived websites, but my presentation is focused on the finding aid side of the house and uses examples from the University Archives collection.

What makes describing websites unique?

In many ways, our approach to archived website description lines up with existing archival finding aid practices. However, there are some ways that archived websites are unique from other materials. For example, date can be tricky. Do we describe the date we archived the website or try to assign some kind of creation date? Our technical services team opted for describing the date we started archiving a website rather than trying to assign the website a date of use or creation. Other challenges are the recurring nature of “crawling” websites, frequently changing content, URL changes and redirects, the differing frequencies used to archive different websites in our collections, and the technical limitations and incompleteness of some archived websites.

Case Studies

We have some consistency in our approach, but we don’t have written documentation yet. The following examples are representative of our approach as well as a couple newer things we have tried more recently.

Archive-It Collection level description

  • The first example is a finding aid for the University Archives’ Archive-it collection. The finding aid was created in 2013 and serves as a blanket entry point and general description of all URLs in the collection. I think this is a helpful finding aid to have, but the University Archives collection has grown a lot since 2013. One improvement might be adding series to this finding aid that describe groups of related URLs in the collection.  The additional description will help the finding aid show up in more searches. It would also provide users with more access points rather than just being transported directly to the entire (very long) list of URLs in our collection.
Screenshot of a portion of a finding aid describing the University Archives archived website collection.
http://finding-aids.lib.unc.edu/40417/

URL level description

  • The second example is adding description of individual URLs to finding aids. This style of description is pretty standard across manuscript collecting units and was implemented broadly by our technical services team in 2013-14. Typically, these URLs were selected for archiving because we already had a collection for the person or organization. When adding individual archived websites to finding aids, we link to the Archive-It “calendar page” that shows each of the dates we archived the URL. The description also provides the URL, the first crawl date by month and year, and a brief description of the live website.
  • This approach works well. One way I’d like to iterate on this approach is to figure out how best to represent the incomplete nature of archived websites in the finding aid. The description of the site describes the live website features and content, but the archived version may be different based on how often we archive it or it may have elements missing due to technical limitations of web crawlers.
  • Example:
Screenshot of a finding aid section describing the Carolina Black Caucus archived website.
http://finding-aids.lib.unc.edu/40363/

Group of related URLs description

  • A third way we’ve represented archived websites is by creator groups and this is a slightly newer approach for us. Instead of listing individual websites on this finding aid, we added one link to the group of URLs created by the student organization. We could have done item level and that might allow for better description of the URLs given that each is quite different (e.g. a Facebook event page vs. Email newsletter vs. a website). But linking to a group of URLs does fit more closely to traditional archival description practices that focus on aggregate rather than items. We’ll have to continue to think about how to handle the donation or selection of several websites by one creator in our descriptions.
  • Example:
Screenshot of part of finding aid showing a group of URLs archived for the Asian Student Association collection.
http://finding-aids.lib.unc.edu/40486/

Intersection of legacy media and websites

  • The last example is really different from our other archived websites. Last year I worked on a project with a colleague to deal with website directories given to UA on optical media (I wrote about it on the blog here). These sites are no longer live on the web. We essentially re-hosted the website, gave it an artificial URL, and crawled it with Archive-It.
  • One of the questions we had was how to best describe these websites. In order to re-host and archive the sites with Archive-It we had to use an artificial URL and the crawl date is very different from the creation/use of the site. Additionally, the directory of files from the DVD had already been ingested to the repository a couple years ago. We needed to make some connections between these factors.
  • We decided to keep a link to the repository, note the DVD identification number, link to Archive-It, and explain a bit about the process to re-host the site.
Screenshot of finding aid section describing archived website given to us on DVD
http://finding-aids.lib.unc.edu/40296/#contentslist

Next Steps

Our staff last talked about this work in 2013-14 when we first started using Archive-It, so our best next step is to revisit this topic as a group and figure out how we can iterate on our current approaches to meet the unique description challenges posed by archived websites. I had the pleasure of participating in the OCLC Web Archives Description working group in 2016-17 and the guidelines produced by the group will be a helpful resource in this discussion. Documentation of our practices for describing websites will be an important addition to our existing documentation for description of born-digital materials in archival finding aids. I’d also like to use more metadata in the Archive-It access interface. The OCLC WAM guidelines can help with that as well.

You can use and explore our archived website collections online through our Archive-It access portal.

 

 

 

C-A-R-O-L-I-N-A: www.unc.edu circa 1997

The UNC Libraries started a web archiving project in January 2013 (read more about that here), but the Internet Archive has been saving websites for much, much longer. In fact, they have saved over 366 BILLION web pages since 1996, accessible through the Wayback Machine.

In the Wayback Machine you can see an archive of UNC.edu since 1997, not to mention tons of other websites. Take a moment to search for some of your favorite websites and see what they looked like 10 (or more!) years ago. Not surprisingly, the Web has changed quite a bit since then.

Here is a snapshot of UNC’s homepage from April , 27 1997 featuring a very creative and informative acrostic linking to University departments and offices.

Screen Shot 2013-11-20 at 2.31.45 PM

Does anyone else think we should bring back the acrostic? What would your acrostic be?

Web archiving fulfills RM needs, too

A few weeks ago, we posted about UARMS’ web archiving program and the work we’re doing to collect and preserve University websites. As archivists, we see websites as important documents that are a fundamental part of today’s culture. Many websites have enduring historical value, and we believe future researchers will be interested in accessing web archives for their unique and rich content.

Another important purpose that our web archives fulfill is much more immediate and relevant to University employees as they do their day-to-day work, especially records management liaisons and web content managers: records management and content recovery. As records managers, we see websites as documents that are being actively created and used in the course of the work done at the University. Many websites are a business record, and as such, previous versions sometimes need to be easily accessed and retrieved for reference.

Untitled drawing

For example, just a few weeks ago we received an inquiry from a department on campus asking if we could retrieve content that “vanished” from their website after migrating to a new content management system.

Luckily, the web documents that went missing had been archived and preserved in our web archives. They were able to use these to patch-up what the migration wasn’t able to transfer, and update their new site.

In today’s technology landscape, everything is changing all the time. Providing a repository where websites are preserved for the long-term, we are not only creating a body of documentation that will be useful to future scholars; we hope that we are also helping UNC employees feel more confident as they change, update, and yes even delete, their office’s web pages and content.

If you manage your office’s website please let us know. We’d love to add it to our archive, and thus help you better manage and preserve the rich content it contains.

Also, if you are looking for documents–analog or digital–that you think may have been transferred to the Archives let us know, we’re happy to help you search.

Saving UNC’s Slice of the Web

Wayback banner
If you have ever stumbled across a webpage with this banner across the top of it, you’ve encountered the Wayback Machine. The Wayback Machine was developed by the Internet Archive in 1996 to start archiving the web, and since then it has collected around 240 billion web pages.

In 2006 the Internet Archive launched Archive-It, which is a hosted service that allows institutions to create their own web archives.

In January of 2013, the UNC Libraries began archiving websites in five different collections. These collections support existing collecting areas in the Libraries and include

You can browse all of our collections through Archive-It, and individual websites have been cataloged for access through the UNC Libraries’ catalog.

Additionally, websites that are part of existing archival collections are described in that collection’s finding aid. For example, you can see description of and get access to an archived version of the North Carolina Literary Festival’s 2009 website from the finding aid for the records of the North Carolina Literary Festival.

Here’s a snippet from that web site, showing the banner that Archive-It uses to let the viewer know that they’re looking at an archived web page.

Screen Shot 2013-10-09 at 11.55.08 AM copy

What are we missing? Are there any web pages you’d like to see in our collections?