Project Description | The Audio-Aligned and Parsed Corpus of Appalachian English (AAPCAppE)

The AAPCAppE is a corpus of one million words of Appalachian speech.

Though often socially stigmatized, Appalachian English is historically central to the development of American English from its British origins, and this corpus serves as a resource unprecedented in scope and in public accessibility for cultural, historical, and linguistic research on the English of Appalachia.

The AAPCAppE (~1-mil words) is based on existing oral history projects housed at institutions around the Appalachian region; some of these (approx. 400k words) were originally vetted, transcribed, and organized by M. Montgomery, with the transcriptions further modified by the AAPCAppE authors. The goals of this project included (a) digitizing the recordings, (b) time-aligning the digitized sound files with the transcripts, and (c) annotating the transcripts with detailed grammatical information, also known as “part-of-speech tagging” and “parsing,” and (d) making all of this available to the public through a web interface.

Digitizing those recordings that were not yet digitized have preserved this valuable cultural resource for future generations, and time-aligning the digitized recordings with the transcripts allows researchers to rapidly find desired parts of the speech signal by searching the transcribed text. The grammatical annotation allows in-depth analyses of particular constructions that are specific to Appalachian English, or typical of vernacular American English more generally, as well as comparisons of Appalachian English with other vernacular Englishes (see the CoNYCE for possibilities), and with earlier stages of the language (see the PPCHE for possibilities).

Because the corpus is large, publicly available, and searchable online with standard, freely accessible, user-friendly computational tools, it will foster replicability, thereby contributing to increased empirical rigor in linguistic research. These same properties also make it possible to use the corpus as a teaching tool at the primary and secondary education levels, as well as at college and graduate levels. On a more general level, this corpus will deepen our understanding of America’s linguistic heritage and promote a scientifically informed appreciation of regional language and culture.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.