Speech Data Management

Software and tools for sociolinguistic data management and analysis

The LVC Lab hosts our efforts to develop and maintain a number of tools for enhancing sociolinguistic research and data management, sharing, and preservation. For instance, we develop and maintain the Sociolinguistic Archive and Analysis Project (in partnership with the NCLLP and NCSU Libraries; more about SLAAP below), the NORM website and vowels.R package for vowel normalization, among other software programs and projects. Dr. Kendall is also a co-PI in the international project “Speech Across Dialects of English (SPADE): Large-scale digital analysis of a spoken language across space and time” funded by the Trans-Atlantic Partnership Digging into Data.

Recently, we’ve also worked with Christina Tortora at CUNY to help develop tools for the development of the Appalachian Syntax Project‘s annotated corpus of Appalachian English. Dr. Kendall participated in a 2015 workshop at NSF on “The Role of Speech Science in Developing Robust Speech Processing Applications” (see his slides there).

Our individual web pages (e.g. http://pages.uoregon.edu/tsk/affiliations.html) describe other collaborations between the Lab and other research groups.

Links and references to several papers and presentations on the subject of data management are below.

SLAAP screenshot collage (from Kendall 2007)

SLAAP screenshot collage (from Kendall 2007)

The Sociolinguistic Archive and Analysis Project

The Sociolinguistic Archive and Analysis Project (SLAAP), at North Carolina State University, is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry.  SLAAP hosts over 4,300 interviews (over 3,700 hours of spoken language audio).  Over 180 hours have time-aligned transcription (= over 1.75 million words of orthographically transcribed speech).

SLAAP is not a corpus but rather a speech data management system, archiving and providing access to a diverse set of sociolinguistic datasets.  Many of the collections housed in SLAAP are indexed in the language resource catalog maintained by OLAC. To find information about many of the collections in SLAAP, you can view SLAAP’s entries in the OLAC catalog here and SLAAP’s main entry at OLAC here.

Papers and presentations