Speech Data Management | LVC Lab @ UOregon

Software and tools for sociolinguistic data management and analysis

The LVC Lab hosts our efforts to develop and maintain a number of tools for enhancing sociolinguistic research and data management, sharing, and preservation. For instance, we develop and maintain the Sociolinguistic Archive and Analysis Project (in partnership with the NCLLP and NCSU Libraries; more about SLAAP below), the NORM website and vowels.R package for vowel normalization, among other software programs and projects. Dr. Kendall is also a co-PI in the international project “Speech Across Dialects of English (SPADE): Large-scale digital analysis of a spoken language across space and time” funded by the Trans-Atlantic Partnership Digging into Data.

Recently, we’ve also worked with Christina Tortora at CUNY to help develop tools for the development of the Appalachian Syntax Project‘s annotated corpus of Appalachian English. Dr. Kendall participated in a 2015 workshop at NSF on “The Role of Speech Science in Developing Robust Speech Processing Applications” (see his slides there).

Our individual web pages (e.g. http://pages.uoregon.edu/tsk/affiliations.html) describe other collaborations between the Lab and other research groups.

Links and references to several papers and presentations on the subject of data management are below.

SLAAP screenshot collage (from Kendall 2007)

The Sociolinguistic Archive and Analysis Project

The Sociolinguistic Archive and Analysis Project (SLAAP), at North Carolina State University, is an interactive web-based archive of sociolinguistic recordings, with integrated media playing and annotation features, as well as phonetic analysis and corpus analysis tools designed for enabling and improving empirical linguistic inquiry. SLAAP hosts over 4,300 interviews (over 3,700 hours of spoken language audio). Over 180 hours have time-aligned transcription (= over 1.75 million words of orthographically transcribed speech).

SLAAP is not a corpus but rather a speech data management system, archiving and providing access to a diverse set of sociolinguistic datasets. Many of the collections housed in SLAAP are indexed in the language resource catalog maintained by OLAC. To find information about many of the collections in SLAAP, you can view SLAAP’s entries in the OLAC catalog here and SLAAP’s main entry at OLAC here.

Papers and presentations

Kendall, Tyler (2014). Archiving and Managing Sociolinguistic Data: The Problems of Portability, Access and Security, and Discoverability and Relevance. Language and Linguistics Compass, 8.11: 495-504. Special issue on archiving sociolinguistic data.
Kendall, Tyler (2013c). Data in the Study of Variation and Change. In J. K. Chambers and Natalie Schilling (eds.), The Handbook of Language Variation and Change, 2nd edition, 38-56. Malden, MA/Oxford: Wiley-Blackwell.
Kendall, Tyler (2013b). Data Preservation and Access. In Christine Mallinson, Becky Childs, Gerard Van Herk (Eds.), Data Collection in Sociolinguistics: Methods and Applications, 195-205. New York: Routledge.
Kendall, Tyler (2013a). Speech Rate, Pause, and Sociolinguistic Variation: Studies in Corpus Sociophonetics. Palgrave Macmillan. [ Book at Palgrave Macmillan | Google Books ]
Kendall, Tyler (2011). Corpora from a sociolinguistic perspective (Corpora sob uma perspectiva sociolinguística). In Stefan Th. Gries (Ed.), Corpus Studies: Future Directions, Special Issue of Revista Brasileira de Linguística Aplicada, 11.2, 361-389.
Kendall, Tyler and Gerard Van Herk (Eds.) (2011). Corpus Linguistics and Sociolinguistic Inquiry, Special Issue of Corpus Linguistics and Linguistic Theory 7.1.
Kendall, Tyler and Ann R. Bradlow (2011). Mobilizing Smaller Datasets for Large-Scale Phonetic Analysis: Web-Databases and Semi-Automatic Analyses. New Tools and Methods for Very-Large-Scale Phonetics Research. Philadephia, PA: University of Pennylvania. January 2011. [ PDF ]
Kendall, Tyler (2010b). Considering the Storage, Management, and Processing of Spontaneous Speech Corpora: New Methods and New Findings. Special Panel: Production and Perception of Spontaneous Speech (organized by Ann Bradlow and Valerie Hazan). 2nd Pan American/Iberian Meeting on Acoustics/160th Meeting of the Acoustical Society of America (ASA160): Cancun, Mexico. November 2010.
Kendall, Tyler (2010a). Developing Web Interfaces to Spoken Language Data Collections. Proceedings of the Chicago Colloquium on Digital Humanities and Computer Science, 1.2. University of Chicago.
Kendall, Tyler (2009). The Value of Relational Databases for Time-Aligned Annotation. American Association of Corpus Linguistics (AACL) 2009: Edmonton, Alberta, Canada. October 2009. [ Slides PDF ]
Kendall, Tyler (2008). On the History and Future of Sociolinguistic Data. Language and Linguistics Compass, 2.2: 332-351. Blackwell Publishing. [ PDF ]
Kendall, Tyler (2007). Enhancing Sociolinguistic Data Collections: The North Carolina Sociolinguistic Archive and Analysis Project. Penn Working Papers in Linguistics 13.2: 15-26. Philadelphia: University of Pennsylvania. [ PDF ]