Speech Data Management

Software and tools for sociolinguistic data management and analysis

The LVC Lab hosts our efforts to develop and maintain a number of tools for enhancing sociolinguistic research and data management, sharing, and preservation. For instance, we develop and maintain the Sociolinguistic Archive and Analysis Project (in partnership with the NCLLP and NCSU Libraries; more about SLAAP below), the NORM website and vowels.R package for vowel normalization, among other software programs and projects.

Most recently, we’ve been working with Christina Tortora at CUNY to help develop tools for the development of the Appalachian Syntax Project‘s annotated corpus of Appalachian English.  We’ve also been working with Kirk Hazen at West Virginia University on his research group’s acoustic analysis of consonantal variation in West Virginia.  TK recently participated in workshop at NSF on “The Role of Speech Science in Developing Robust Speech Processing Applications” (see his slides there).  Our individual web pages (e.g. http://pages.uoregon.edu/tsk/affiliations.html) describe other collaborations between the Lab and other research groups.

Links and references to several papers and presentations on the subject of data management are below.

SLAAP screenshot collage (from Kendall 2007)

SLAAP screenshot collage (from Kendall 2007)

The Sociolinguistic Archive and Analysis Project

The Sociolinguistic Archive and Analysis Project (SLAAP), at North Carolina State University, is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry.  SLAAP is currently off-line during much of the summer of 2015 while we make a number of architectural improvements and move the server to a new hosting environment.  SLAAP hosts over 3,500 interviews (over 3,000 hours of spoken language audio).  Over 100 hours have time-aligned transcription (= over 1 million words of orthographically transcribed speech).

SLAAP is not a corpus but rather a speech data management system, archiving and providing access to a diverse set of sociolinguistic datasets.  Many of the collections housed in SLAAP are indexed in the language resource catalog maintained by OLAC. To find information about many of the collections in SLAAP, you can view SLAAP’s entries in the OLAC catalog here and SLAAP’s main entry at OLAC here.

Papers and presentations