Community Archives as Data: Reflections on Oral History Text Analysis within the Eastern Kentucky African American Migration Project Archive

Between September 2020 and January 2021, I worked on an EBSCO-funded internship administered through the Association of Research Libraries focused on a Python text analysis project on oral history interview transcripts. I worked with the Community-Driven Archives team, which partnered with communities across the American South to amplify histories that have been silenced or marginalized in traditional archives. The purpose of this project was to explore the possibility of using computational methods on oral history data. I was interested in exploring how computational methods can build upon digitization, in which historical records are searchable on the web, to make community archives more accessible to their respective communities.

My project focused on oral histories created by the Eastern Kentucky African American Migration Project (EKAAMP), a public history and community archival project centered on the stories of Black former coal mining families in Eastern Kentucky. The Community-Driven Archives team collaborated with EKAAMP to support the creation of its collection, some of which is housed in the Southern Historical Collection at University Libraries at UNC-Chapel Hill, along with a series of traveling exhibitions. EKAAMP honors the place of Black Americans in Appalachia.

Learning Text Analysis in Python

Python is a powerful programming language. Aside from its extensive use in software and web development, Python is also widely used in computing and applying computational methods to humanities and social sciences data because of its powerful data modeling libraries and natural language processing algorithms. One such application is text analysis, where a body of textual data is processed and analyzed. When done well, text analysis can reveal patterns in topics and sentiments in large quantities of textual data.

I started this project being very new to the world of text analysis and to Python as a programming language. I used a variety of resources in my self-directed and explorative learning process, both on Python and on text analysis methodologies. Here are some of the resources that guided my project and helped me respond to challenges along the way:

I found the following text analysis projects and papers informative and relevant to this project:

I learned that most resources and available projects utilizing text analysis deal with bodies of text that are different than oral histories, both in content and structure. For instance, the conversational format of the interview transcripts meant that the more common text analysis techniques that are used on other kinds of texts will not yield meaningful results. Based on what I learned in my research, I explored two main text analysis methods, including Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Dirichlet Allocation (LDA), to categorize main topics discussed in the interviews.

My preliminary results yielded by the LDA methods aligned with the main topics covered in EKAAMP and Dr. Karida Brown’s research related to these histories. Some of the main topics that emerged in the text analysis results were: family reunions, school integration, and mining accidents.

Archives as Data: Computational Methods for Community Archives

In support of the larger work of the Community-Driven Archives project, I wanted to explore the research value of archival collections, such as oral history transcripts, as data that can be analyzed and visualized through computational methods. My goal was to gain deeper insight into these histories through text analysis, an automated process for gaining insight into a large collection of oral history interviews by mapping common topics and visualizing patterns in conversations. Such computational techniques would then be used to extract a variety of data about the collection as a whole and about individual interviews. This would support community researchers’ discovery and identification of these histories.

This model can then be built upon and implemented in the future for metadata generation for collections with similar size and scope. High quality metadata that provides descriptive information about an oral history collection not only facilitates better discovery and identification, but also creates exciting possibilities for presenting and analyzing the research data, such as data visualization of migration paths among Black former coal mining families represented in EKAAMP oral histories.

Perhaps this project can serve as an example of using computational tools and techniques for unlocking data and gaining insight into large oral history collections. Community leaders in charge of similar public history projects can use such tools to reimagine discovery, management, and description of their oral history archives.

I am in the final phases of developing a website to open source my code for the public to use and build upon. The website will include a brief collection description, as well as a brief discussion of challenges and snippets of Python code to run on an internet browser.

Data analysis methods and techniques can transform large quantities of non-machine-readable content, such as oral histories, into machine readable content. This transformation would enable community leaders to enhance their archives by creating robust research data sets for computational research. The new data would then allow community researchers to present, visualize, and analyze oral histories in new and dynamic ways.

Copyright and Community-Driven Archives

When it comes to protecting intellectual property that is part of your or your community’s history, it helps to understand what legal rights apply to your materials. 

Community-based archives are a pathway for groups of people to exercise self-determination over the collection and interpretation of their histories. Historically marginalized communities draw on community-archival methods to preserve and share stories that are often missing from institutional archives and dominant historical narratives.

It is especially important to many of our partner history keepers through our Community-Driven Archives initiative to know what rights they and their community collaborators have over their stories and historical records. This requires an understanding of copyright and how it works.

What is copyright?

According to Anne Gilliland, Scholarly Communications Officer with UNC Libraries, copyright is your legal right to determine the permitted uses of your tangible expressions of creative work. What does that mean and what kinds of things amount to “tangible expressions of creative work”?

This is not an exhaustive list, but it does give you a sense of what kinds of things are legally under copyright:

    • Musical compositions
    • Films
    • Artwork/media
    • Oral histories
    • Photographs

One big takeaway is that copyright does not cover non-recorded stories and ideas.

Many of our collaborators are rightfully concerned about their control over future uses of their shared stories and materials. Many have heard about or know of an example of someone’s story making its way to Hollywood or on the radio or even featured on a city-sponsored project without the knowledge of that person or their descendants.

While acknowledging on one hand the gaps, omissions, and injustices of U.S. laws, our goal as a Community-Driven Archives Team is to help history keepers get familiar with a few best practices for making use of the legal protections that are available. We also want to help groups and institutions who work with oral histories and other people’s historical materials take the proper steps before making use of someone’s story or creative work.

Copyright Best Practices

Best Practice #1: Assume that every creative work is under copyright until you know that it is not.

the Old Well at UNC-Chapel Hill surrounded by Spring flowers
An example of an image in the public domain featuring UNC-Chapel Hill. Credit: Jack a lanier, CC BY-SA 4.0, via Wikimedia Commons

Most creative works are automatically under copyright unless the copyright holder (the creator or their designated heirs) explicitly gives away their copyright or the record goes into the public domain, which usually takes about a century.

Just because you found it online does not mean that you are free to share it. Most online materials are under copyright.

Look for ways to seek permission to share or reuse the item in question. Sometimes, a simple web search will clue you in on permission requirements; other times, you may need to take the time to track down heirs and make phone calls to descendants for consent. If you are working with an institutional archive, staff members can help you track down creators for permission. If you cannot find someone to provide consent, then you can investigate fair use, which is a framework to help you assess whether you can fairly justify the use of copyrighted materials without the permission of the creator or someone authorized to provide consent.

The item may also be free to use because it is in the public domain. This applies to many items, including those created by the federal government and those that date back to the early 20th century or earlier. To learn what groups of historical and cultural materials have passed into the public domain, you can check out this chart updated each year by Cornell University.

A black and white image of four white male-presenting people in front of the Old Well at UNC
This 19th-century photograph of the Old Well at UNC-Chapel Hill is another example of an image in the public domain, this time because it is over a century old. From the North Carolina Collection at the Wilson Special Collections Library at UNC-Chapel Hill

Best Practice #2: For oral histories, interviewers should always ask their interviewees for their consent and their terms of reuse.

According to our lawyer-in-residence, Anne Gilliland, oral histories are considered a joint creation between the interviewer and the interviewee.

For interviewers:

Bernetiae leans over a group of seated African American women to assist them during a training
CDA Team member Bernetiae Reed leads an oral history training in San Antonio, TX, November 2017. Courtesy UNC CDAT

If you are a community archivist wanting to preserve and/or share oral histories you have collected, you should create a consent form where your interviewee gives you permission to record their story. This form should outline the allowed uses for the recorded interview. Consent forms also ask about additional restrictions, if any, that interviewees require for the sharing of their interview. If it applies, interviewees should also be informed about the institutional repository (i.e. archive, library, museum, etc.) to which their materials will be donated.

A license is a way of communicating the terms for allowed uses of creative works (such uses include: display, distribution, performance, reproduction, derivative works, and audio transmission). For example, a license can state that someone’s interview should be used only for educational and/or nonprofit purposes, or only if the original format is not altered (i.e. no derivative works can be adapted from the interview). Creative Commons licenses are popular and give creators standardized language for their terms of reuse.

For interviewees:

Unless the form you sign says so explicitly, signing a consent form does not mean that you are giving away your copyright. Creators maintain their copyright for at least the duration of their lifetime, unless they formally agree to end their copyright. If you are being interviewed, it is important that you feel comfortable with the terms of the interview. Take the time to read through the consent form to make sure you agree with the license laid out there. Read the section above for more information about creating a license.

Best Practice #3: Be upfront about your mission and goals with your audience and your collaborators.

Why are you making your works or materials available to members of the public? Make it clear to potential audiences. For example, if you want to share your creative works with public audiences for educational purposes, that tells you something about your mission. Perhaps your mission is to inspire people in Chapel Hill, NC to take action for environmental justice through sharing nature photographs from the 1970s and 80s with web users. Write up that mission and share it on your website. If you are concerned that people might use your photographs for purposes outside the scope of your mission, make sure your license for reuse is somewhere prominent and easy to find on your site.

If you are asking someone to sign a consent form that would allow you to share their digitized image, oral history, or creative work with public audiences, be upfront with them about the mission and goals of your project. This helps build trust. If your collaborator likes your project and appreciates your intended use of their materials, they will be less likely to require additional restrictions be placed on the material, which will make it easier for you and others to use and share it over time. Again, it is important to make sure you and your collaborator agree on the terms of use for their materials, and that the related license is easily accessible with the terms of use clearly presented to public audiences.

Best Practice #4: For sensitive materials, consider alternative ways of sharing them with selected audiences.

If you are concerned with how members of the public will share or use your materials, think about limiting your terms of use.

For digitized items (physical papers or photographs that are scanned and made into a digital file), consider creating a private space online to share them only with select members of your community. Or share them widely but upload a version of the file that is stamped with a watermark to prevent unintended uses. Signing a consent form to share your digitized materials with any history keeper or institutional partner does not mean you are giving away your copyright.

If you are sending items to a repository (e.g. institutional archive, library, museum, etc.), make sure you are also clear with that institution on your terms of use. Review all forms they ask you to sign to ensure that you retain your copyright and ask that your preferred license be included (a.k.a. your terms of use). Let the institution know if you intend for your materials to be a loan or a permanent gift. If it is a loan, indicate when and under which conditions materials should be returned to their owner.

If you are worried about any unintended uses of digitized materials shared with a repository, consider asking your institutional partner to keep your materials off the internet or to share them selectively, as outlined above.

Additional Resources

For more about community-based archives and considerations for project partnership on the Southern Sources blog:

What’s In an Archive? Deciding Where Your Historical Materials Will Live

The Community-Driven Archives Project at UNC-Chapel Hill is supported by a grant from the Andrew W. Mellon Foundation.

Follow us on Twitter: @SoHistColl_1930 #CommunityDrivenArchives #CDAT #SHC

Storytelling through Community-Driven Archives

Our unique approach to archival workflows is one thing that sets community-driven archives approaches apart from mainstream archival methods. Traditionally, archivists stick to access and preservation and leave interpretation and storytelling to the researchers. But what happens when we listen to what our audiences want? We find ways to help them tell meaningful stories about their communities’ history.

Our Core Audiences and Understanding What Matters to Them

Within our community-driven archives (CDA) project, audience means a lot to us. This has been true since the beginning of our grant project, and it is getting clearer as we head towards wrapping it up and sharing what we have learned.

Our project strives to support and amplify historical projects by and for communities underrepresented in institutional archives. Our priority audience is community-based archive projects: groups of people who are interested in creating an archival project documenting their own community. We also work with individual history keepers: people, like family genealogists and community organizers, wanting to document and share histories currently missing from dominant archives and narratives due to legacies of injustice.

While brainstorming for what will go on our new project website (coming soon), our team looked at all the tools and resources that we have created with our project partners since the start of the grant. We asked ourselves, what kinds of resources are most useful to our core audiences?

Title page of Storytelling webinar with UNC logo
Learn about documentary storytelling in this CDA webinar with Theo Moore of Hiztorical Vision Productions.

We noticed that one of the most common resource requests that we receive from our community collaborators is for more tools about storytelling. In response, we have created new resources on topics like “the art of storytelling” and “how to create an exhibition.”

But what do we mean by storytelling and why should archives professionals care?

Making a Case for Storytelling in Community Archives Projects

Archives have historically prioritized the access and preservation of historical records over the interpretation of history, leaving the latter to researchers. For community archives projects, we believe it must be different.

Community archives projects address gaps in the dominant historical record and complicate mainstream historical narratives. Communities want their stories told on their own terms. Through building an archive, the collecting of history supports the (re)telling of it.

In many cases, the stories uncovered through community-based collections are not otherwise known. Due to legacies of racism, settler colonialism, patriarchy, and other forms of oppression, histories by and for Black, Indigenous, people of color, women, and LGBTIQ people have been hidden or silenced. As a result, beyond building their collection, many history keepers also want to broadcast the stories they’ve researched and curated through their archive. Many history keepers want to control how their community’s stories get told, rightfully questioning outside researchers’ and institutions’ motivations.

a museum exhibit with a backdrop of a church featuring men's and women's clothing on two mannequins near a table of historical artifacts
One section of the EKAAMP, one of our pilot partners, exhibition on the Eastern KY Social Club, 2018.

Our collaborators and partners share their communities’ stories through a variety of methods: exhibitions, public programs, websites, social media and blog posts, documentaries, short videos, and more.
We believe that archival professionals like ourselves working with community archives projects must consider the importance of storytelling. Through community collaboration, we have the opportunity to support both the safeguarding and sharing of stories.

Ideas for Archival Institutions

Since the beginning of our community-driven archives project, our work has extended to train and resource history keepers to share stories they uncover through developing a collection.

Oral histories easily lend themselves to exhibitions and other vehicles for sharing stories with visitors. Members of our team have trained local history keepers to conduct oral histories as a path to preserving memories. From there, we have worked with our community partners to incorporate oral history clips and collections materials into physical and digital exhibitions and short documentary videos that narrate important stories. One of our pilot partners, the Appalachian Student Health Coalition, created a web-based storytelling project that draws on video interviews and data visualizations to share its members’ contributions to rural healthcare in Appalachia.

Through our Archival Seedlings program, we help our ten resident Seedlings develop an archival collection and share their historical project with chosen audiences. For some Seedlings, the websites, videos, blog posts, and exhibits they create to highlight their collection are the only places their audiences can find those histories. While some Seedlings are working with traditional institutions and repositories to preserve and share their finished projects, others are choosing to keep their collections with history keepers in their own communities and to take on project promotion and outreach themselves.

A Black person seated in from of a sign in the background reading "Swift Memorial Jr. College Reunion"
Stella Gudger, Founder of the Swift Museum in Rogerville, TN, from Archival Seedling William Isom II’s interview with her.

One Seedlings participant, William Isom II, is compiling a collection of video interviews with alumni from the historically Black Swift Memorial Institute in Rogersville, TN into a video that will be on view in the museum located on Swift’s historic campus. In addition, one of our pilot partners, the Eastern Kentucky African American Migration Project (EKAAMP) has recently launched an exhibition sharing the oral histories and archival materials it collected through community-based research. These are great examples of how a collection can be “put to work” to share stories.

Storytelling Resources

Storytelling resources are available along with other related tools and trainings on our website.

For more about community-based archives on the Southern Sources blog:

What’s in an Archive? Deciding Where Your Historical Materials Will Live

The Community-Driven Archives Project at UNC-Chapel Hill is supported by a grant from the Andrew W. Mellon Foundation.

Follow us on Twitter @SoHistColl_1930 #CommunityDrivenArchives #CDAT #SHC

A Rare Gateway to an Untouchable Past: Oral Histories of Carrboro Mill Families

Between 1974 and 1978, the Chapel Hill Historical Society conducted interviews with men and women who had lived and worked in and around Chapel Hill and Carrboro during the early twentieth century. One of their first projects, “Generations of Carrboro Mill Families” consisted of 117 interviews with Carrboro residents and textile mill workers. The interviews were in response to the Carrboro Board of Alderman’s decision to tear down the original Carr Mill building. For a rather complicated, and long-winded reason, the Southern Historical Collection holds 40 of the 117 interviews conducted, both the audio cassette tapes and their 30-50 page typed transcripts. Question topics run the gamut, and there was a clear effort on the part of the Chapel Hill Historical Society interviewers to gather information about “everyday life.”

“Textile Mill, Greensboro” in the Bayard Morgan Wootten Photographic Collection #P0011, North Carolina Collection, University of North Carolina at Chapel Hill Library.
This image of a textile mill in Greensboro, NC shows a carding room ca. 1904-1954, probably similar to the one the interviewees describe from the mills in Carrboro.

Some of this work is captured in Valerie Quinney’s article, “Mill Village Memories” published in Southern Exposure in Fall 1980. Quinney was one of the interviewers from the Chapel Hill Historical Society in the 1970s. She offers a meaningful overview of the oral history collection and provides supportive context. Although she includes direct quotes, there’s value in the raw format of the interview collection that is worth pursuing.

Continue reading “A Rare Gateway to an Untouchable Past: Oral Histories of Carrboro Mill Families”

Four activists to be honored in Chapel Hill, SHC preserves documentation of their legacy

This Sunday, August 28, 2011, four names will be added to a plaque at Chapel Hill’s “Peace and Justice Plaza.” Yonni Chapman, Rebecca Clark, Rev. Charles M. Jones and Dan Pollitt will all be honored posthumously for their contributions to civil rights, social justice and equality in the Chapel Hill community. The ceremony will begin at 3pm in front of the Historic Chapel Hill Post Office on Franklin Street, just across the street from UNC’s McCorkle Place. For the full story, see the article, “Four Honored for Activism,” from the Chapel Hill News.

The Southern Historical Collection is proud to preserve a large body of material that documents the lives and legacies of these four activists, including:

Charles Miles Jones Papers – The collection includes correspondence, church documents and publications, clippings, and other items reflecting Jones’s ministry and concern for civil rights. Materials generally focus on his public rather than personal life with a special emphasis on the 1952-1953 investigation of his Chapel Hill Presbyterian Church ministry. General correspondence includes letters from supporters (among them Frank Porter Graham) and detractors, commenting on the investigation, Jones’s sermons, and several well-publicized actions in support of social justice causes.

Oral history interview with Rebecca Clark (1 interview available online via DocSouth’s Oral Histories of the American South project) – In this interview, Rebecca Clark recalls living and working in segregated North Carolina. She finished her schooling in all-black schools, so the bulk of her experience with white people in a segregated context took place in the work world. There she experienced economic discrimination in a variety of forms, and despite her claims that many black people kept quiet in the face of racial discrimination at the time, she often agitated for, and won, better pay. Along with offering some information about school desegregation, this interview provides a look into the constricted economic lives of black Americans living under Jim Crow.

John K. Chapman Papers (available Fall 2011) – This collection documents Yonni Chapman’s social activism and academic achievements, and offers an account of nearly four decades of progressive racial, social, and economic justice struggles in the central North Carolina region. Organizational materials, including correspondence, notes, newsletters and reports, document the activities of the Communist Workers’ Party, the Federation for Progress, the Orange County Rainbow Coalition of Conscience, the New Democratic Movement, the Freedom Legacy Project, and the Campaign for Historical Accuracy and Truth, among other organizations on the UNC-Chapel Hill campus, in Chapel Hill, N.C., Durham, N.C., Raleigh, N.C., and Greensboro, N.C. Workers’ rights and racial justice campaigns and commemorations, including those of the Greensboro Massacre and the campaign to end the Cornelia Phillips Spencer Bell Award on the UNC-Chapel Hill campus, are documented in paper, audio, visual, and photographic formats.

Daniel H. Pollitt Papers (available Fall 2012) – This collection documents Dan Pollitt’s distinguished career as an attorney, professor in the University of North Carolina Law School, and civil rights activist in the American South. The collection documents Pollitt’s activities with a number of organizations, including: the National Labor Relations Board, the National Sharecroppers Fund, the NAACP, the North Carolina Civil Liberties Union, the American Association of University Professors, the Rural Advancement Fund, and other organizations. Material also covers Pollitt’s involvement with the Speaker Ban controversy at the University of North Carolina, his opposition to the death penalty in North Carolina, issues of congressional misconduct, and many other legal and ethical matters.

Oral history interviews with Daniel H. Pollitt (13 interviews, many of which are available online via DocSouth’s Oral Histories of the American South project)

Four activists to be honored in Chapel Hill, SHC preserves documentation of their legacy

This Sunday, August 28, 2011, four names will be added to a plaque at Chapel Hill’s “Peace and Justice Plaza.” Yonni Chapman, Rebecca Clark, Rev. Charles M. Jones and Dan Pollitt will all be honored posthumously for their contributions to civil rights, social justice and equality in the Chapel Hill community. The ceremony will begin at 3pm in front of the Historic Chapel Hill Post Office on Franklin Street, just across the street from UNC’s McCorkle Place. For the full story, see the article, “Four Honored for Activism,” from the Chapel Hill News.

The Southern Historical Collection is proud to preserve a large body of material that documents the lives and legacies of these four activists, including:

Charles Miles Jones Papers – The collection includes correspondence, church documents and publications, clippings, and other items reflecting Jones’s ministry and concern for civil rights. Materials generally focus on his public rather than personal life with a special emphasis on the 1952-1953 investigation of his Chapel Hill Presbyterian Church ministry. General correspondence includes letters from supporters (among them Frank Porter Graham) and detractors, commenting on the investigation, Jones’s sermons, and several well-publicized actions in support of social justice causes.

Oral history interview with Rebecca Clark (1 interview available online via DocSouth’s Oral Histories of the American South project) – In this interview, Rebecca Clark recalls living and working in segregated North Carolina. She finished her schooling in all-black schools, so the bulk of her experience with white people in a segregated context took place in the work world. There she experienced economic discrimination in a variety of forms, and despite her claims that many black people kept quiet in the face of racial discrimination at the time, she often agitated for, and won, better pay. Along with offering some information about school desegregation, this interview provides a look into the constricted economic lives of black Americans living under Jim Crow.

John K. Chapman Papers (available Fall 2011) – This collection documents Yonni Chapman’s social activism and academic achievements, and offers an account of nearly four decades of progressive racial, social, and economic justice struggles in the central North Carolina region. Organizational materials, including correspondence, notes, newsletters and reports, document the activities of the Communist Workers’ Party, the Federation for Progress, the Orange County Rainbow Coalition of Conscience, the New Democratic Movement, the Freedom Legacy Project, and the Campaign for Historical Accuracy and Truth, among other organizations on the UNC-Chapel Hill campus, in Chapel Hill, N.C., Durham, N.C., Raleigh, N.C., and Greensboro, N.C. Workers’ rights and racial justice campaigns and commemorations, including those of the Greensboro Massacre and the campaign to end the Cornelia Phillips Spencer Bell Award on the UNC-Chapel Hill campus, are documented in paper, audio, visual, and photographic formats.

Daniel H. Pollitt Papers (available Fall 2012) – This collection documents Dan Pollitt’s distinguished career as an attorney, professor in the University of North Carolina Law School, and civil rights activist in the American South. The collection documents Pollitt’s activities with a number of organizations, including: the National Labor Relations Board, the National Sharecroppers Fund, the NAACP, the North Carolina Civil Liberties Union, the American Association of University Professors, the Rural Advancement Fund, and other organizations. Material also covers Pollitt’s involvement with the Speaker Ban controversy at the University of North Carolina, his opposition to the death penalty in North Carolina, issues of congressional misconduct, and many other legal and ethical matters.

Oral history interviews with Daniel H. Pollitt (13 interviews, many of which are available online via DocSouth’s Oral Histories of the American South project)

New SOHP Database

The Southern Historical Collection is pleased to present the new Southern Oral History Program (SOHP) interview database, at http://www.lib.unc.edu/dc/sohp/?CISOROOT=/sohp.

The new site provides users with even greater search capabilities and functionality. Most importantly, we now have the ability to deliver digital content on the Web. In addition to the 500+ interviews already delivered digitally by the IMLS funded Documenting the American South’s digital collection Oral Histories of the American South, users can now access another 330 digital transcripts as well as approximately 290 digital audio interviews from the new site. These numbers will only continue to grow.

The new site includes a number of browse pages (Interviewee, Interviewer, Project, Occupation, Subject, and Ethnicity), as well as the old site’s keyword searches. A powerful advanced search is available from the main Libraries digital collections search page as well.

We invite comments and feedback on the new database.