This site is for NowPhon 1, 2015.  For NowPhon 2, 2016, please go here: https://blogs.uoregon.edu/nowphon2016/

June 4, 5:30-7:30, EMU Gumwood Room

Barth, Danielle (UO) – Function word production in child speech, child-directed speech and inter-adult speech

This poster summarizes work on the production of homophonous verbs and auxiliaries in inter-adult speech, child-directed speech and child speech. Using data drawn from a larger project on the development of prosody in school-aged children (cf. Redford, 2014) and from the Buckeye Corpus (Pitt et al., 2007), I examine word reduction strategies in the speech of children and adults. Caregivers from the present study are speaking to school-aged children (5-10 years old) therefore lack many features of traditional child-directed speech. Despite this, caregivers use more simplified semantic content (as measured by text entropy) and relatively longer function words with younger children than older children. Younger children use more simplified semantic content, have proportionally longer function words, and much slower speech rate (Redford, 2014) than older children. All adults and children shorten their function words in predictable contexts and lengthen them in the presence of following disfluencies. In adult speech, function words also received less intonational prominence (as measured by fundamental frequency). However, children put greater intonational prominence on some function words, even while shortening their durations. Adults shortened vowels to shorten whole words, but children’s vowel durations did not correspond to their word durations. Adults had one of two main information compression strategies: they either contracted function words often or they shortened function words greatly. Older children also followed this pattern to some degree, but younger children (under 7 years old) either both reduced word length and contracted words often or did neither. Taken together, this research indicates that although children have the tools for word reduction, they do not use these strategies fully in concert with each other, even at a relatively late age.

Chan, Queenie (SFU) – Opacity effects in the affixation of the intraisitive/active voice marker in the Nanwang dialect of Puyuma

 The current study examines the phonological processes related to the three allomorphs of the intransitive/active voice marker in the Nanwang dialect of Puyuma, an Austronesian language spoken by a tribal group of the Taiwanese aborigines in southeastern Taiwan. The marker is underlyingly /m/ and its three alternations, [m], [me] and [em], are a result of interactions between three phonological processes—(i) [m] prefixation or infixation, (ii) labial dissimilation and (iii) vowel epenthesis. This paper gives an OT-CC account of the data, which demonstrate counterbleeding interactions between [m] affixation and vowel epenthesis and between labial dissimilation and vowel epenthesis. The two required precedence constraints reveal that DEP is the least preferred repair strategy for both counterbleeding interactions, the reason for which requires future research.

Chow, Una Y., & Stephen J. Winters (Calgary) – Perception of statements and questions in English, Mandarin and Cantonese

In this work, we present the preliminary results of a perception task from a larger project aimed at developing an exemplar-based computational model of intonation perception.  We tested the native-speaker discrimination of statements and yes/no questions in three languages that distinguish these sentence types by differing intonation patterns: English, Mandarin and Cantonese. Mandarin and Cantonese are both tone languages, but Mandarin signals yes/no questions with an elevated F0 range throughout an utterance, while Cantonese uses an F0 rise at just the end of the utterance. English also uses a final F0 rise to signal yes/no questions, but without the complication of lexical tones. The different ways in which these three languages cue sentence type distinction will provide a strong test of our model’s flexibility. Our goal for this work-in-progress is to have 20 listeners from each language group identify the intonational category of utterances in a modified gating task, in which they hear increasingly larger segments of an utterance from beginning to end–or from end to beginning. Listeners respond to these utterances in a speeded classification task and then rate how strongly they perceive each utterance as either a statement or a question. Preliminary results indicate that–as expected–the loss of utterance-final information makes the task harder for Cantonese listeners than Mandarin listeners, but that Cantonese and English listeners fare better when presented with just the final two syllables in an utterance. There is also a general bias towards statement responses in listeners of all three languages; however, this bias reverses when listeners hear only the final two syllables of each utterance. After further analysis, we hope that these data will not only provide a clearer picture of how listeners of each language identify the intonational content of an utterance, but also provide a meaningful baseline against which to compare the performance of our planned computational model.

Currie Hall, Kathleen, Blake Allen, Michael Fry, Scott Mackie, & Michael McAuliffe (UBC) – Phonological CorpusTools

In this poster, we present Phonological CorpusTools (PCT), a newly developed and freely available software for doing phonological analysis on transcribed corpora. There is an ever-increasing interest in exploring the roles of frequency and usage in understanding phonological phenomena, and corpora of language give us a way of making generalizations across wide swaths of such usage, exploring patterns in under-documented languages, and creating balanced stimuli in experiments. Many corpora and existing corpus-analysis software tools, however, are focused on dialogue- and sentence-level analysis, and/or the computational skills needed to efficiently handle large corpora can be daunting to learn. PCT is designed with the phonologist in mind and has an easy-to-use graphical user interface that requires no programming knowledge. It is intended to be used specifically for phonological analysis, such as feature-based searches, calculation of phonotactic probability, neighbourhood density, functional load, predictability of distribution, and mutual information. It can also be used for calculating global acoustic similarity measures between sound files. We will explain the various kinds of corpora that can be examined, introduce the different analyses that can be run, and give examples of research projects that PCT has been applied to.

Farrington, Charlie, Tyler Kendall, & Valerie Fridland (UO & U Nevada, Reno) – Inherent spectral change in the Southern Vowel Shift

Southern varieties are well known to be affected by spectral shifts (the Southern Vowel Shift or SVS) that alter the positional relationship between front tense and lax vowels. However, previous work on the SVS generally limits its focus to steady state formant measures, and possible links between these shifts and durational and dynamic trajectory distinctions have largely been unexplored despite common mentions of Southern “breaking” or “drawl”. Recent sociophonetic work has highlighted the importance of vowel inherent spectral change in dialectal differences (Fox & Jacewicz 2009), and several studies looking at duration in Southern dialects, in particular, have noted significantly longer lax vowels than other regional varieties (Clopper et al. 2005; Jacewicz et al. 2007). In this poster, we examine production data from speakers in three states across the south, Tennessee (N=18), North Carolina (N=8), and Virginia (N=8). We focus here on front vowel production (/i, ɪ, e, ɛ, æ/), and ask: (1) To what extent does spectral onset position (the typical measure of SVS participation) correlate with inherent spectral trajectory for these speakers? (2) Does vowel inherent spectral change differ by field site in the south? And (3) What quantitative measures best capture vowel inherent spectral change for the SVS? We first compare vowel nuclei spectral position to measures of vector length, trajectory length and spectral rate of change (Fox & Jacewicz 2009). Additionally, we calculate angle measures for vowels (treating each vowel plotted in F1/F2 space as a triangle with vertices at the 20%, 50% and 80% points in each vowel’s duration) to examine whether the angles between dynamic vowel components correlate with SVS participation. Preliminary results indicate that Southern shifted speakers produce similar spectral trajectories, showing more diphthongal characteristics compared to non-shifted speakers, who maintain more of a single steady state spectral shape. Thus, positional shift and vowel internal change may in fact work in tandem.

Fry, Michael (UBC) – Statistical learning based speech segmentation: A cross-linguistic, corpus-based perspective

With no consistent acoustic cues demarcating words in natural speech (Lehiste, 1960; Cole & Jikimik, 1980), the mechanisms underpinning word segmentation in infants are not fully known. Statistical learning, which relies on infants’ ability to extract statistical regularities from stimuli and was developed in Saffran, Aslin and Newport’s (1996) seminal work, is one possible mechanism.  Support for statistical learning largely comes from artificial language learning experiments, which has led some researchers to question its effectiveness with more natural language (Johnson and Tyler, 2010; Yang, 2004; Johnson and Jusczyk, 2001). To address this question, the current work investigates the statistical regularities present in natural speech corpora and reports on the separability of within-word and between-word transitions. The separability of these two transition types is critical for statistical learning to be effective. Three metrics previously proposed in the literature, Forward Transitional Probability, Backward Transitional Probability and Mutual Information are employed to encapsulate the statistical regularities that are thought to enable the separation of these transition types. Spontaneous speech corpora from English, Cantonese, Japanese and Tunisian Arabic are analyzed, with results providing evidence that statistical separability of within-word and between-word transitions does exist, to varying degrees, cross-linguistically. Further, while no one metric consistently affords the most separability, Mutual Information is generally the most robust.

Jones, Jacqueline, & Stephen Winters (Calgary)  – I Bag Your Pardon: The influences of modality, context, grammar and identity on the Albertan [æ]/[ɛ] Vowel Shift

In Alberta English–as in other North American dialects (Zeller, 1997)–the vowel [æ] is currently shifting higher and fronter in the vowel space before [g], resulting in an apparent merger with the vowel [ɛ] in this context. In this study, we attempted to: (1) describe the extent and direction of this change; (2) identify speaker traits that may signal propagators and resistors of language change, and (3) examine the effects of modality of stimulus presentation on listener productions of [æ] and [ɛ], as a window into the influence of “self grammars” and “community grammars” on this merger. In a production task, we asked speakers to produce [æ] and [ɛ] in words and non-words, preceding both [g] and other consonants, from prompts in three different modalities: auditory, orthographic, and pictorial. We also collected background data from speakers on a wide variety of personal and demographic dimensions. All subjects shifted [æ] higher and fronter in the vowel space when preceding [g], but the extent of this shift depended greatly on the individual speaker. Only half of the participants actually merged the vowels, with speakers who had lived outside of Alberta being more likely to merge. Somewhat unexpectedly, all subjects also shifted [ɛ] backer and lower in their vowel space when preceding [k], perhaps as a result of the “Canadian Shift” (Clarke et al., 1995). The modality of stimulus presentation also affected production. Vowels produced from pictorial prompts–which we assumed best reflected the speaker’s internalized “self grammar”–were closest to those produced in calibration. On the other hand, vowels shifted most in response to auditory prompts–especially for speakers who merged the vowels–thus reflecting the influence of the external “community grammar”. Overall, these results indicate that this merger is still in progress in Alberta, and that it is being spearheaded by those speakers who are more likely to mimic auditory cues, and thus spread sound change to a greater degree.

Melguy, Yevgeniy (Reed) – Hearing across languages: Bilinguals’ perception of (not-so)-non-native stop contrasts

 It is well known that monolingual speakers often have difficulty discriminating phonological contrasts (sounds that speakers of a language perceive to belong to different speech  categories) that do not exist in their native language (Best & Strange, 1992; Hallé, Best, & Levitt, 1999; Werker, Gilbert, & Humphrey, 1981). While research has shown that bilinguals are subject to similar perceptual constraints (Antoniou, Best, & Tyler, 2013; Sebastián-Gallés & Soto-Faraco, 1999), the question of whether bilinguals have simultaneous access to both their L1 and L2 phonologies in discriminating a non-native contrast has not been systematically examined. This study attempted to do so by comparing the ability of Chinese-English and Spanish-English bilinguals to discriminate non-native phonological contrasts consisting of sounds that exist in either their L1 or L2 (but not in both). Findings showed that while bilinguals are sensitive to phonetic differences between such sounds, they were unable to discriminate them any better than a monolingual English control group.

Olejarczuk, Paul, Vsevolod Kapatsinski, & Melissa A. Redford (UO) – Cognitive limitations affect category breadth in perceptual learning

In recent work, we extended work on perceptual category learning (Gibson & Gibson 1955, Posner & Keele 1968) to the acquisition of intonation contour categories (Anonymous, in revision). We used a flat prototype (—–), a final-fall prototype (˜˜˜˜\), and a two-peaked prototype (/\_/\_) and created distortions of these prototypes to model within-category variability. We presented adult native English speakers as well as 9- to 11-year-old children learning English as their L1 with examples of each contour category. Training examples were minor perturbations of the prototype, averaging out to the prototype. This created well-separated categories with multiple partially redundant and acoustically variable but necessary features. This kind of category structure has been argued to be typical of intonation contours (Pierrehumbert 2000). We hypothesized that an adult would require all of the necessary features of a contour category to be present before they classified a novel contour into the same category, the same way that an adult requires all phonological features of a word like [blӕk] to have been intended by the speaker to perceive it as that word. A one-feature intentional deviation, as in [blӕɡ] is bad enough to block categorization into the same category, e.g. eliminating repetition priming in adults (Stockall & Marantz 2006; Darcy et al. 2009). Despite the fact that [blӕɡ] is more similar to [blӕk] than to any other word, and the features of [blӕ[Dorsal]] are sufficient for rejecting all words that compete for recognition with [blӕk], [blӕɡ] is not perceived as a realization of [blӕk] without contextual evidence that it is a mispronunciation thereof (and thus that a [k] was actually intended). The voicelessness of [k] in [blӕk] is a necessary feature. Similarly, while the contour /\_/\_ can be distinguished from all other contours in the experiment by, say, the initial rise, we hypothesized that adults would require all necessary features of /\_/\_ to be present. However, paying attention to all of the necessary features of a contour presents a working memory challenge. Children, whose working memory capacity is not yet mature (Luna et al. 2004), may not be able to keep track of all of the necessary features, thus requiring only some of them to be present (Ward & Scott 1987, Thompson 1994). On each test trial, participants would hear an exemplar from one category, and categorize it as something said by a trained creature or, crucially, ‘One of these other guys’ (represented by a picture of many different novel creatures). Choosing ‘One of these other guys’ was classified as a rejection, similar to perceiving [blӕɡ] as a pseudoword. As expected, adults were less accepting, indicating that /\____ could not have been said by the creature that said /\_/\_, whereas children accepted /\___. However, surprisingly, both children and adults accepted __/\_. In the present experiment, we re-instantiated the contours over shorter syllable sequences (7 rather than 16 syllables). With these shorter, less memory-demanding stimuli, adults rejected ___/\_. We argue that the results support tracing the child-adult differences in acceptance of high-level variability to the memory demands posed by keeping track of all of the necessary features of a temporally extended contour.

Teo, Amos, & Linda Konnerth (UO) – Evidence of mismatch between tonal production and perception in Karbi

This paper provides preliminary evidence for a mismatch between the production and perception of tones in Karbi, a Tibeto-Burman language spoken in Assam. It represents one of the first phonetic studies of an under-described minority language of India. In a previous description of Karbi, Grüßner (1978) described three tones: low, mid and high, which are all distinguished by pitch height. He further noted that mid tones in word-final position are accompanied by syllable-final glottalization. However, given technical difficulties associated with doing fieldwork in the area at the time, this description was necessarily based on researcher-centered impressionistic evidence. More recently, Anonymous (submitted) conducted an acoustic study of monosyllabic stems in carrier phrases and found evidence to support Grüßner’s claim that the low and high tones in Karbi are distinguished in production by pitch height. On the other hand, of the two speakers included in the study, only Speaker 2 produced pitch patterns that significantly distinguished a low, a mid and a high tone. Both speakers did produce word-final mid tone stems with syllable-final glottalization, as described by Grüßner, but this was lost when suffixes were added to the stem. An initial hypothesis was that the contrast between the mid and high tones was neutralized on suffixed stems for Speaker 1 but not Speaker 2, who continued to distinguish them by pitch height in production. This hypothesis was tested in a perception experiment involving six listeners who were asked to identify both word-final and suffixed monosyllabic stems, as spoken in carrier phrases by Speakers 1 and 2. In addition, Speaker 2 listened to his own recordings to identify the stems. The results of the perception study showed that, contrary to expectations, the difference in pitch height produced by Speaker 2 was not found to assist listeners in differentiating mid and high tones on suffixed stems. Even more strikingly, it was found that Speaker 2 himself was unable to reliably identify suffixed stems, despite producing a significant difference in pitch height in his own recordings. Given that listeners were still generally able to identify monosyllabic stems in word-final position as produced by both Speakers 1 and 2, it is likely that syllable-final glottalization, not contrastive pitch height, is the main perceptual cue for what has been called the ‘mid tone’ in Karbi. The significance of these findings, particularly the mismatch between tone production and perception by Speaker 2, will be discussed in the talk. We consider the possibility of a near-merger situation (as per Labov et al., 1991) in Karbi, whereby pitch is not a reliable perceptual cue for the mid-high distinction within the language community, even for speakers who produce the distinction, perhaps because these speakers are exposed to speech where the distinction is not made. In addition, these findings raise the question of what it means to continue referring to Karbi as a ‘tone’ language.

Teruya, Hideko, & Vsevolod Kapatsinski (UO) – The emergence of /r/ epenthesis in L2 learners of a rhotic dialect

We investigated /r/ epenthesis after non-high vowels by Japanese learners of a rhotic dialect of American English. We asked learners to produce stories with many vowel-final words. All of the L2 speakers exhibited /r/ epenthesis, though rhotic English does not have this process. /r/ epenthesis may be the result of an acquired constraint against words ending in lax vowels or, more specifically, schwa (Harris, 1994) or in favor of words ending in /r/. If this latter hypothesis is correct, /r/ epenthesis can emerge in dialects that do not have /r/ deletion, i.e., rhotic dialects (hyper-rhoticity, Britton, 2007; Well, 1982). The present paper suggests that this prediction holds. Computational modeling of the rhotic dialect lexicon using the Maximum Entropy phonotactic learner (MaxEnt, Hayes & Wilson, 2008) shows that words ending with a vowel or multiple vowels, especially non-high vowels are disfavored compared to words ending in /r/. Although the resulting phonotactic constraints do not drive alternations in native speakers of rhotic English, perhaps, due to a higher weight of faithfulness constraints, L2 learners, for whom faithfulness might still be weighted low, may be sensitive to them and produce /r/ epenthesis in L2 English even if exposed only to a rhotic dialect. We suggest that /r/ epenthesis is acquired from an abundance of /r/-final words in rhotic English. More generally, the results of the present study support the existence of phonetically-unmotivated phonotactic constraints or schemas in the mental grammar (Bybee 2001, Hayes & Wilson 2008) and a synchronic link between phonotactics and alternations (Pater & Tessier 2006).

Vajrabhaya, Prakaiwan, & Vsevolod Kapatsinski (UO) – First time’s the charm: First-mention lengthening as an automated act

 Words are longer when they are mentioned for the first time within a discourse and shorter in subsequent mentions. This Repetition Effect is usually attributed to the fact that, with repetition, the word becomes more accessible to either the speaker or the listener.   We argue that the Repetition Effect is better analyzed as lengthening of words that are mentioned for the first time within a discourse (Bell et al., 2009). Furthermore, we propose that this first-mention lengthening is an automatic behavior triggered by discourse structure, rather than reflecting online changes in word accessibility for either interlocutor. In support of this proposal, we show that words are always longer when they are mentioned for the first time within a coherent stretch of discourse, even when they have previously been mentioned by the speaker to the same listener and thus highly accessible to both interlocutors.