born digital – For the Record

Behind the Scenes: Describing Archived Websites

On May 22, I participated in an Archive-It training webinar on describing archived websites. The following is a summary of my short presentation on the Wilson Special Collection Library’s approach to describing archived websites in finding aids.

Special Collections has been archiving websites with Archive-It since 2013. Our Archive-It account is spilt into collections that reflect our five main collecting units as well as one collection for the UNC at Chapel Hill Art Library. Some of our collecting units use catalog records to describe archived websites, but my presentation is focused on the finding aid side of the house and uses examples from the University Archives collection.

What makes describing websites unique?

In many ways, our approach to archived website description lines up with existing archival finding aid practices. However, there are some ways that archived websites are unique from other materials. For example, date can be tricky. Do we describe the date we archived the website or try to assign some kind of creation date? Our technical services team opted for describing the date we started archiving a website rather than trying to assign the website a date of use or creation. Other challenges are the recurring nature of “crawling” websites, frequently changing content, URL changes and redirects, the differing frequencies used to archive different websites in our collections, and the technical limitations and incompleteness of some archived websites.

Case Studies

We have some consistency in our approach, but we don’t have written documentation yet. The following examples are representative of our approach as well as a couple newer things we have tried more recently.

Archive-It Collection level description

The first example is a finding aid for the University Archives’ Archive-it collection. The finding aid was created in 2013 and serves as a blanket entry point and general description of all URLs in the collection. I think this is a helpful finding aid to have, but the University Archives collection has grown a lot since 2013. One improvement might be adding series to this finding aid that describe groups of related URLs in the collection. The additional description will help the finding aid show up in more searches. It would also provide users with more access points rather than just being transported directly to the entire (very long) list of URLs in our collection.

Screenshot of a portion of a finding aid describing the University Archives archived website collection. — http://finding-aids.lib.unc.edu/40417/

URL level description

The second example is adding description of individual URLs to finding aids. This style of description is pretty standard across manuscript collecting units and was implemented broadly by our technical services team in 2013-14. Typically, these URLs were selected for archiving because we already had a collection for the person or organization. When adding individual archived websites to finding aids, we link to the Archive-It “calendar page” that shows each of the dates we archived the URL. The description also provides the URL, the first crawl date by month and year, and a brief description of the live website.
This approach works well. One way I’d like to iterate on this approach is to figure out how best to represent the incomplete nature of archived websites in the finding aid. The description of the site describes the live website features and content, but the archived version may be different based on how often we archive it or it may have elements missing due to technical limitations of web crawlers.
Example:

Screenshot of a finding aid section describing the Carolina Black Caucus archived website. — http://finding-aids.lib.unc.edu/40363/

Group of related URLs description

A third way we’ve represented archived websites is by creator groups and this is a slightly newer approach for us. Instead of listing individual websites on this finding aid, we added one link to the group of URLs created by the student organization. We could have done item level and that might allow for better description of the URLs given that each is quite different (e.g. a Facebook event page vs. Email newsletter vs. a website). But linking to a group of URLs does fit more closely to traditional archival description practices that focus on aggregate rather than items. We’ll have to continue to think about how to handle the donation or selection of several websites by one creator in our descriptions.
Example:

Screenshot of part of finding aid showing a group of URLs archived for the Asian Student Association collection. — http://finding-aids.lib.unc.edu/40486/

Intersection of legacy media and websites

The last example is really different from our other archived websites. Last year I worked on a project with a colleague to deal with website directories given to UA on optical media (I wrote about it on the blog here). These sites are no longer live on the web. We essentially re-hosted the website, gave it an artificial URL, and crawled it with Archive-It.
One of the questions we had was how to best describe these websites. In order to re-host and archive the sites with Archive-It we had to use an artificial URL and the crawl date is very different from the creation/use of the site. Additionally, the directory of files from the DVD had already been ingested to the repository a couple years ago. We needed to make some connections between these factors.
We decided to keep a link to the repository, note the DVD identification number, link to Archive-It, and explain a bit about the process to re-host the site.

Screenshot of finding aid section describing archived website given to us on DVD — http://finding-aids.lib.unc.edu/40296/#contentslist

Next Steps

Our staff last talked about this work in 2013-14 when we first started using Archive-It, so our best next step is to revisit this topic as a group and figure out how we can iterate on our current approaches to meet the unique description challenges posed by archived websites. I had the pleasure of participating in the OCLC Web Archives Description working group in 2016-17 and the guidelines produced by the group will be a helpful resource in this discussion. Documentation of our practices for describing websites will be an important addition to our existing documentation for description of born-digital materials in archival finding aids. I’d also like to use more metadata in the Archive-It access interface. The OCLC WAM guidelines can help with that as well.

You can use and explore our archived website collections online through our Archive-It access portal.

The Legacy Digital Media Project

For about a year and half now, we have been developing a project at Wilson to acquire material from digital storage media in processed collections and make it available in the Carolina Digital Repository. Having finished the research, workflow development, and testing phases, we have started to implement the project and are making some of these materials available in the CDR as we work through the processed collections. This project aims both to preserve the material safely (instead of on storage media that can be fragile) and to provide easier access to the material for researchers in the collections at Wilson Library.

The material we’re working with is stored on media including 3 1/2 inch, 5 1/4 inch, and 8 inch floppy disks, zip disks, and optical disks like CDs and DVDs. Some of the material, such as the material on the CDs and DVDs, has been accessible via listening or viewing copies, but the material on the other formats will made available for the first time. We are very excited about getting this material to users, so please keep an eye out for further posts as we make new collections available in the CDR.

October 10th is Electronic Records Day

Today is Electronic Records Day! The Council of State Archivists (CoSA) started the tradition of Electronic Records Day three years ago, and has flyers available for personal electronic records, government agencies working with electronic records, and why electronic records may need special attention.

Here at University Archives, we follow our retention schedule for all records regardless of format. We know that sometimes electronic records present special challenges, though! Please see our guidelines page for information about records retention at UNC, including email. The North Carolina State Archives also has helpful guidelines for electronic records. Finally, our FAQ page offers guidance for electronic records issues at UNC.

Of course, every day is Electronic Records Day for us, and we are here to support you if you have electronic records questions!

NARA’s Capstone Email Initiative: A Virtual Discussion

Last week, Electronic Records Archivist Meg Tuomala participated in a virtual discussion about the National Archives and Records Administration’s (NARA) Capstone Email Initiative, which gives guidance on a new way for federal agencies to manage email records. The discussion was led by Arian Ravanbakhsh and Beth Cron, both records management policy analysts in the Office of the Chief Records Officer at NARA.

The discussion was hosted by the Society of American Archivists‘ Records Management Roundtable, and a video recording is available here. Arian and Beth give a great overview of the Initiative and weigh-in on several questions and considerations surrounding it for not just federal agencies, but state governments, universities, and private organizations too.

If you’re at all interested in the records management side of UARMS’ work, we hope that you can take some time to view the recording.

UARMS is very interested in applying the Capstone method of capturing and archiving email of enduring value generated at UNC. As discussed in the recording and addressed in the Initiative, it’s not a perfect solution, but it could be a practical and real way for us to make strides towards preserving email– a format that has become integral to our work over the past 20 years and thus serves to document the history of the University in the 21st century.

Saving UNC’s Slice of the Web

If you have ever stumbled across a webpage with this banner across the top of it, you’ve encountered the Wayback Machine. The Wayback Machine was developed by the Internet Archive in 1996 to start archiving the web, and since then it has collected around 240 billion web pages.

In 2006 the Internet Archive launched Archive-It, which is a hosted service that allows institutions to create their own web archives.

In January of 2013, the UNC Libraries began archiving websites in five different collections. These collections support existing collecting areas in the Libraries and include

Digital Artists’ File curated by the Sloane Art Library,
North Carolina Collection Web Archives,
Rare Book Collection Web Archives,
Southern Historical Collection Web Archives, and
University Archives Web Archives.

You can browse all of our collections through Archive-It, and individual websites have been cataloged for access through the UNC Libraries’ catalog.

Additionally, websites that are part of existing archival collections are described in that collection’s finding aid. For example, you can see description of and get access to an archived version of the North Carolina Literary Festival’s 2009 website from the finding aid for the records of the North Carolina Literary Festival.

Here’s a snippet from that web site, showing the banner that Archive-It uses to let the viewer know that they’re looking at an archived web page.

Cleaning House

Recently UNC Libraries launched a new, redesigned website. As any archivist should, we took this opportunity to look at some of the older, somewhat outdated content of the previous website and flag materials for archiving.

Amongst other items, we decided to save a bunch of photographs, some of which were taken by a library employee during the renovation of the Robert B. House Undergraduate Library (the UL).

Here you can see the evolution of a favorite UL study spot, the new books reading room.

Feb14-02DSCN0013 — …construction continues…

...The grand re-opening... — …the grand re-opening…

6newbooksreadingroom — …after the renovations!

Special thanks to Kim Vassiliadis, head of User Experience, who alerted us to these cool photographs before they were deleted from our web servers.

Construction photos taken by Fred Stipe, head of the Library’s Digital Production Center during the UL renovations (1999-2001).

Acquiring Born-digital University Records

Most departments have moved from creating and managing paper records to handling files in digital formats. University Archives is now receiving records of permanent value that are born-digital. We are developing our skills and tools to handle these digital files.

Here are some highlights:

We have a digital repository, the Carolina Digital Repository, where files can be safely stored and preserved.

We use write blockers like the Tableau T35es and Tableau T8-R2 to ensure that materials are not altered during the transfer process.

We have a great tool that prepares the materials for the Carolina Digital Repository called Curator’s Workbench. It’s free, open-source software that other repositories can use for handing digital materials.

If you are part of a unit on campus that has digital materials that you’d like to transfer to University Archives, please contact us.