ORAAL and CORAAL

The newest funded project at the LVC Lab is developing public resources and tools for improving research on and education about African American Language (AAL), funded by NSF grant BCS-1358724. A large component of this project is the creation of the Online Repository of African American Language (ORAAL) website. This work builds on the Dr. Kendall’s extensive experience designing and developing software tools and web-based repositories for linguistic data. In particular, the project builds on the framework and success of the Sociolinguistic Archive and Analysis Project (SLAAP) at North Carolina State University, which Dr. Kendall has been leading in collaboration with Dr. Walt Wolfram since 2005. The ORAAL website will have a more engaging, publicly-oriented interface designed to appeal to public users (such as K-12 students, families, and other non-linguists) in addition to researchers, with supporting contextual and educational information about AAL. There has been an information gap over the last forty years between academics and the general public, such that many of the myths with respect to AAL persist in the public domain to this day. The ORAAL website aims to dispel many of these myths by acting as a clearing-house for primary data, educational materials, and best-practices for teaching and learning about language variation in general and the systematic nature and sociohistorical context of African American Language in particular.

IMG_3882

Members of the CORAAL team presenting at NWAV 44 in Toronto

The second component of this project, the Corpus of Regional African American Language (CORAAL), seeks to promote the wider availability of primary AAL data. These data are being collected with human subjects, participant, and copyright permissions allowing for wide public release and sharing and will be made available as a public corpus under a Creative Commons License (such as the Attribution-NonCommercial-ShareAlike 3.0 License used by the TalkBank Project). CORAAL will have a core component comprised of both legacy (Fasold 1972) and current sociolinguistic interview recordings of African Americans born and raised in Washington DC of diverse ages, several social class backgrounds, and both sexes. In addition to this core component in Washington DC, we aim to include sociolinguistic interviews from several other locales around the U.S. (e.g. California, Michigan, North Carolina) to add regional diversity to the core component. We envision CORAAL to contain about 150 sociolinguistic interviews, which are being orthographically transcribed in a time-aligned format (Bird & Liberman 2001, Kendall 2007). Our development of the main corpus currently
focuses on providing accurate time-aligned transcription for the speech, but we anticipate including additional linguistic annotation, such as part-of-speech tags and syntactic parsing (see Tortora, Santorini, & Blanchette, in progress; http://csivc.csi.cuny.edu/aapcappe/).