Annotated highlights by topic

This annotated list is getting a little out of date

For a full chronological list, see here

General theory

Kapatsinski, V. (2018). Changing minds changing tools: From learning theory to language acquisition to language change. Cambridge, MA: MIT Press.

Cover blurb: In this book, Vsevolod Kapatsinski argues that language acquisition—often approached as an isolated domain, subject to its own laws and mechanisms—is simply learning, subject to the same laws as learning in other domains and well described by associative models. Synthesizing research in domain-general learning theory as it relates to language acquisition, Kapatsinski argues that the way minds change as a result of experience can help explain how languages change over time and can predict the likely directions of language change—which in turn predicts what kinds of structures we find in the languages of the world. What we know about how we learn (the core question of learning theory) can help us understand why languages are the way they are (the core question of theoretical linguistics).

Taking a dynamic, usage-based perspective, Kapatsinski focuses on diachronic universals, recurrent pathways of language change, rather than synchronic universals, properties that all languages share. Topics include associative approaches to learning and the neural implementation of the proposed mechanisms; selective attention; units of language; a comparison of associative and Bayesian approaches to learning; representation in the mind of visual and auditory experience; the production of new words and new forms of words; and automatization of repeated action sequences. This approach brings us closer to understanding why languages are the way they are than approaches premised on innate knowledge of language universals and the language acquisition device.

Kapatsinski, V. (2014). What is grammar like? A usage-based constructionist perspective. Linguistic Issues in Language Technology.

Main points: This is a review paper arguing the following points. You need grammar (generalization) and not just exemplars. Grammar includes constructions (form-meaning pairings) but also syntagmatic form-form associations and paradigmatic construction-construction associations. Grammar is non-parametric: there is no limit to the number of constructions in a language. Grammar is partially redundant: Any feature is predictable from many other features (see also Winter 2014). There are many routes to get from a form to a meaning or from a meaning to a form that are used in parallel (see esp. Beekhuizen, Bod & Zuidema 2013 as well as Kapatsinski 2010). There is no one way to produce something in language. Different people might do the same thing in different ways (behavioral uniformity together with representational diversity). There is no single mental grammar shared by all members of a community. Multimodel inference helps infer the ensemble of grammars that might underlie the observed behavior (see also Barth & Kapatsinski 2014).

See also:

Some book-length related works: Barðdal (2008), Bybee (2001, 1985), Dabrowska (2004), Goldberg (2006), Langacker (1987), Nesset (2008)

For multimodel inference: Barth, D., & V. Kapatsinski. (2014). A multimodel inference approach to categorical variant choice: construction, priming and frequency effects on the choice between full and contracted forms of am, are and is. Corpus Linguistics & Linguistic Theory. And also Kuperman & Bresnan (2012).

There is related work on categorization that I was not aware of at the time of writing, which is summarized here (see pp.23-25).

Morphophonology, productivity, sublexical constructions

Kapatsinski, V. Towards a Learning-Theoretic morphophonology.

What’s in it: An attempt to model productive morphophonology using the apparatus of associationist learning theory (Rescorla & Wagner, 1972; Baayen et al., 2011). The specific fata being modeled come from my dissertation.

Smolek, A., & V. Kapatsinski. Why not to change (the stem): A production-internal account of paradigm uniformity.

What’s in it: Presents experimental data supporting the claim that errors like bup –> buptSi and, more generally, the preference against stem changes result from motor perseveration in deriving novel forms of known words and the difficulty with associating articulatorily dissimilar units (Rescorla, 1973; Kapatsinski, 2011 ICPhS). The tendency to level stem changes is stronger for articulatorily ‘bigger’ changes (like p–>tS vs. k–>tS). Learners trained on p>tSa overgeneralize to t>tSa and k>tSa but not vice versa. So, those exposed to p>tSa, t>ta, k>ka learn to palatalize everything or to palatalize nothing. In contrast, those exposed to t>tSa, p>pa, k>ka or k>tSa, p>pa, t>ta learn to palatalize the right thing. Note that there is no reason to palatalize any consonant before [a], so this is not a bias in favor of the natural rule (change in context) but in favor of the smaller change. This asymmetry is present in both production (here is a new singular, make a plural) and judgment (is this the right plural for this singular?) but is significantly stronger in production than in judgment. We think judgment sometimes involves simulating production (“would I say that?”), so the effect is still there in judgment but is weaker than in real production, attesting to its articulatory nature. For the opposing position, see Steriade (2001) and White (2014). For related data, see Do & Albright (Submitted), Kerkhoff (2007), Krajewski, Theakston, Lieven & Tomasello (2011), and White (2014).

See also:

Stave, M., A. Smolek, & V. Kapatsinski. (2013). Inductive bias against stem changes as perseveration: Experimental evidence for an articulatory approach to output-output faithfulness. Proceedings of the 35th Annual Conference of the Cognitive Science Society, 3454-59. Austin, TX: Cognitive Science Society. What’s in it:

Kapatsinski, V. (2013). Conspiring to mean: Experimental and computational evidence for a usage-based harmonic approach to morphophonology. Language, 89(1), 110-48.

Main findings: If you expose human learners to a language with an alternation, like singular buk / plural butSi; singular mak / plural matSi etc. learn the alternation better if they also encounter a lot of pairs like singular blitS / plural blitSi. However, they learn the alternation worse if they encounter a lot of pairs like singular blip / plural blipi or singular blit / plural bliti. Learners also often produce blutSi from blut and bluptSi from blup.

Theory: Learners are acquiring sublexical constructions like ‘plural = …tSi‘. These constructions are gradually becoming more specific: start out with ”plural = …i“, which is satisfied with producing bluk from bluki; then, acquire ‘plural = …tSi‘, which is satisfied with producing bluktSi from bluk; then acquire ‘plural = …VtSi‘, which requires bluk to be changed into blutSi. The constructions are competing with a tendency to repeat the articulatory gestures comprising the known, singular form. So, the [k] of bluk wants to surface in the plural. A strong, and specific construction is required to override [k]’s desire to be expressed. This is how you get bluptSi from blup, which satisfies both ‘plural = …tSi‘ and the perseveratory pressure to keep the [p] of blup. Reusing as much of the known form as you can is argued to be functional (improving performance of children as well as connectionist models).

See also:

Kapatsinski, V. (2013). Morphological schema induction by means of conditional inference trees. In B. Cartoni, D. Bernhard, & D. Tribout, eds. TACMO Workshop. Theoretical and Computational Morphology: New Trends and Synergies, 11-14. Geneva. What’s in it? Discusses advantages and limitations of the conditional inference tree implementation of sublexical construction learning used in the main paper above.

Kapatsinski, V. (2012). What statistics do learners track? Rules, constraints or schemas in (artificial) grammar learning. In Gries, S. Th., & D. Divjak, eds. Frequency effects in language: Learning and processing, 53-82. Mouton de Gruyter. What’s in it? The effect of type of training (presenting singulars and plurals that share the same stem next to each other or not) on the acquired grammar. (The same main results above are obtained either way.) Also argues that in order to support learning the output of speech perception should not be simply the identity of the most likely phoneme or word but a probability distribution over possible phonemes or words.

Kapatsinski, V. (2010). Velar palatalization in Russian and artificial grammar: Constraints on models of morphophonology. Journal of Laboratory Phonology, 1(2), 361-393. What’s in it? As also shown in the 2013 Language paper, if you expose human learners to a language with an alternation, like singular buk / plural butSi; singular mak / plural matSi etc. learn the alternation worse if they encounter a lot of pairs like singular blip / plural blipi or singular blit / plural bliti. The diachronic prediction is the following: If a suffix (-i here) that triggers an alternation is usually used with stems that cannot undergo an alternation, it is predicted to be a bad trigger for the alternation. In other words, the alternation should be(come) unproductive. This is shown to have happened in Russian, where the alternation k–>tS is productive before -ok and -ek, diminutive suffixes that tend to attach to [k]-final stems but unproductive before -i, a stem extension that usually attaches to non-velars, and before -ik, a diminutive that rarely attaches to velars. The Russian data also provide evidence against the traditional notion that the suffix is chosen before the decision to alternate is made. For a short version, see Kapatsinski, V. (2010). Rethinking rule reliability: Why an exceptionless rule can fail. Chicago Linguistic Society, 44(2), 277-291.

Some related work from other theoretical perspectives: Albright & Hayes (2003), Becker & Fainleib (2009), Gouskova & Becker (2013), Hayes & Wilson (2008), Labov (1969)

Kapatsinski, V. (2005). Characteristics of a rule-based default are dissociable: Evidence against the Dual Mechanism Model. In S. Franks, F. Y. Gladney, and M. Tasseva-Kurktchieva, eds. Formal Approaches to Slavic Linguistics 13: The South Carolina Meeting, 136-146. Ann Arbor, MI: Michigan Slavic Publications. What’s in it? Elicited production evidence from Russian showing that there are morphological systems in which there is no default expression for a certain meaning. Argues against Pinker & Prince’s Dual Mechanism Model. There is highly related work making the same point for Polish: Dabrowska (2001, 2004)

Statistical learning, perceptual learning, inductive bias

Kapatsinski, V., P. Olejarczuk, & M. A. Redford. (2014). Perceptual learning of intonation contour categories in adults and 9 to 11-year-old children: Adults are more narrow-minded. Under review.

What’s in it: Extends work on perceptual category learning to categories of temporally extended patterns. Previous work on linguistic patterns look at short patterns like vowels, consonants, tones and pitch accents. Previous work on non-linguistic category learning looks at simple visual patterns that can be perceived simultaneously. Examining temporally extended patterns is important to understand the roles of short-term/working memory, local/global processing biases and temporal integration in categorization. We show that intonation contour category representations in both adults and 9 to 11-year-olds are relatively abstract: novel and familiar exemplars are accepted into the category equally often if distance to the category center is controlled. This is inconsistent with exemplar models of categorization but is quite in line with decision-bound (Ashby & Gott 1988) and window-based (Keating 1990) models. We also show that adults are less accepting, rejecting high distortions of the category prototypes that children accept. These high distortions are categorized as not belonging to any of the observed contour categories, since they are generally not highly confusable with examples of the other categories. We discuss possible reasons for children’s greater tolerance to deviation from prior experience. We are currently extending this work to adults with different L1 experiences and children with autism.

See also:

For starting small and starting big, see Newport (1990), Elman (1993), Arnon & Ramscar (2012) and Tomasello (2003)

On perceptual learning, good places to start are Gibson & Gibson (1955), Goldstone (1998), Maye, Werker & Gerken (2002), Bertelsen, Vroomen & DeGelder (2003), Norris, McQueen & Cutler (2003), and Idemaru & Holt (2011)

On category learning, and the prototype distortion paradigm, see work citing Posner & Keele (1968) See J. Smith & Minda (2000) on limitations of the literature.

On developmental differences in category learning, I found the following very useful: Aslin & L. Smith (1988), L. Smith (1989), Thompson (1994), Ward et al. (1990), & Wills et al. (2013)

On learning categories of temporally extended patterns, see Berger & Hatwell (1996) and Schwarzer (1997)

For perceptual learning in language, the most extended patterns examined are probably pitch accent and stress patterns, which require comparing two adjacent syllables: Shport (2011) and Reinisch & Weber (2012)

Kapatsinski, V. (2013). Conspiring to mean: Experimental and computational evidence for a usage-based harmonic approach to morphophonology. Language, 89(1), 110-48.

What’s in it: Examines the acquisition of patterns that allow one to derive a novel form of a known word, and argues that the patterns acquired most easily are constructions (at least from perceptual experience). See above for more information.

See also: Kapatsinski, V. (2012). What statistics do learners track? Rules, constraints or schemas in (artificial) grammar learning. In Gries, S. Th., & D. Divjak, eds. Frequency effects in language: Learning and processing, 53-82. Mouton de Gruyter. Kapatsinski, V. (2010). Velar palatalization in Russian and artificial grammar: Constraints on models of morphophonology. Journal of Laboratory Phonology, 1(2), 361-393.

What’s in it: Learners exposed to a language in which /p/ alternates with /tS/ before -a but /t/ and /k/ do not (labial palatalization in an unnatural context) learn either that all stops become /tS/ or that nothing does. In contrast, learners exposed to a language in which either /t/ or /k/ becomes /tS/ but /p/ does not learn to palatalize only /t/ or /k/ without overgeneralizing to /p/. We argue that this is a production bias based on finding that it is attenuated in a judgment task compared to elicited production. See above for more details.

Idemaru, K., L. Holt, & V. Kapatsinski. (2012). The time-course of dimension-based statistical learning. Paper presented at AMLaP, Riva del Garda, Italy.

What’s in it: Idemaru & Holt (2011) have shown that learners can implicitly unlearn language-specific correlations between acoustic parameters, like VOT and F0. In this work, we examine the timecourse of this unlearning using visual world eyetracking. It is very fast! The great speed is consistent with the findings that the learning is speaker- and sound-specific (Idemaru & Holt 2012).

Johnston, L. H., & V. Kapatsinski. (2011). In the beginning there were the weird: A phonotactic novelty preference in adult word learning. Proceedings of the 17th International Congress of Phonetic Sciences, 1022-1025.

What’s in it: A cross-situational word learning study (using the paradigm developed by Yu & L. Smith 2007), showing that learners are more accurate if the words being learned are phonotactically illegal. We have replicated this with words being mapped onto pictures of tools rather than alien creatures, so it’s unlikely that this effect is just a bias to select the most science-finctiony words. It could be that the illegal clusters are cues to this words having been previously encountered during the experiment though. As part of this study we also found that even our adult learners would accept one-feature deviations from the trained words (which did not make it into the paper due to space constraints), as shown previously for infants (e.g. Stager & Werker 1997, Swingley 2005). Since we find this for adults, the acceptance of minor deviations from unfamiliar words does not appear to be about infants’ speech perception abilities being immature. Rather, it is likely that representations of word forms (and construction forms more generally) become more specific with experience (Kapatsinski 2013).

Kapatsinski, V. (2011). Modularity in the channel: The link between separability of features and learnability of dependencies between them. Proceedings of the 17th International Congress of Phonetic Sciences, 1022-1025.

What’s in it: Moreton (2008) introduced a distinction between channel bias and inductive bias. Channel bias involves imperfect transmission from the speaker to the listener because of noise in the ‘channel’ (speech production mechanism, air, hearing, speech perception mechanism). Inductive bias of the language learner is localized in the prior over grammars used by the (hypothetically Bayesian) language acquisition device. Moreton argued that most biases affecting acquisition of language identified so far could be thought of as channel biases. (I concur.) In this paper, Moreton argued that one type of bias, which he terms ‘modularity bias’, is an inductive bias, caused by the higher prior probability of simpler grammars. The modularity bias refers to the finding that dependencies involving a consonant feature and a vowel feature are harder to learn than those involving two vowel features or two consonant features. This paper presents evidence that the bias also manifests itself in perception (using the Garner interference paradigm of Garner & Felfoldy 1970). Following Warker, Dell, Whalen & Gereg (2008), I argue that the modularity bias is a channel bias too. I suggest that the reason for it is because consonants and vowels form distinct clusters of representations in the brain (recently demonstrated by Mesgarani, Cheung, Johnson & Chang 2014). Learning associations between arbitrary vowel and consonant features is then expected to involve modifying more connections than learning associations between vowel features or associations between consonant features. More generally, this discussion can be seen as an instance of the debate on the division of labor between Bayesian computational vs. mechanistic (connectionist and agent-based) models in cognitive science. (The latter appear best suited for simulating the sources of channel bias.) See also McClelland, Botvinick, Noelle, Plaut, Rogers, Seidenberg & L. Smith (2010). vs. Griffiths, Chater, Kemp, Perfors & Tennenbaum (2010).

Kapatsinski, V. (2009). Testing theories of linguistic constituency with configural learning: The case of the English syllable. Language, 85(2), 248-277.

What’s in it: Linguistic constituents are usually represented as nodes in trees. However, dependency grammar represents them as connections between the parts. This paper argues that, iff constituents are nodes, then they should be able to acquire associations that their parts do not have. This is similar to an argument Bybee (2006) uses to argue that originally compositional words or phrases must be stored in order to become non-compositional, through association with ‘special’ phonology, semantics or morphosyntax. This paper reports that at least some rimes (phonological constituents) in English (like the /ag/ in /gag/) can be associated with either prefixes or suffixes (Cag-num or num-Cag) without the parts of the rime becoming associated with the same prefix or suffix. In contrast bodies, like the /ga/ in /gag/, cannot become associated with prefixes or suffixes without sharing those associations with their parts. These results suggest that constituency is more than mere dependency. A constituent has a node that can then be associated with other nodes. A non-constituent is just its parts, so it cannot have associations that its parts do not have.

See also:

Kapatsinski, V. (2007). Implementing and testing theories of linguistic constituency I: English syllable structure. Research on Spoken Language Processing Progress Report No.28, 241-276. Indiana University Speech Research Lab. What’s in it: Implementations of the competing models of constituency as associative networks.

Kapatsinski, V., & D. B. Pisoni. (2008). The role of phonetic detail in associating phonological units. Poster presented at Laboratory Phonology XI, Wellington, New Zealand. What’s in it: The same vowel varies in its realization across different instances of the same body more than it varies across different instances of the same rime due to coarticulation. It might be that the test vowels are harder to identify in the body condition than in the rime condition. This poster shows that the rime/body distinction persists if difficulty of recognizing test tokens of the trained vowels is taken into account.

Goldstone (2000) and Saavedra (1975) for other uses of the same kind of training paradigm to argue for representations of wholes as distinct from representations of their parts in other domains.

Statistical Methods

Barth, D., & V. Kapatsinski. (2017). A multimodel inference approach to categorical variant choice: construction, priming and frequency effects on the choice between full and contracted forms of am, are and is. Corpus Linguistics & Linguistic Theory. Please e-mail for offprints.

What’s in it: Applies multimodel inference (model averaging) to quantitative analysis of corpus data. Discusses the difference between E-Language and I-Language (Chomsky 1986) goals of corpus analysis. We argue that the realistic goal for corpus analysis is to discover a grammar (for us, statistical model) that is good at predicting characteristics of future language samples from the speech community that is represented by the corpus (E-Language goal). The I-Language goal of discovering the grammar that generated the corpus is likely not achievable: there is no single best grammar because there are many idiolects within a community. Furthermore, if the I-Language grammar is as complex as neurology would lead us to believe, its parameters are likely not discoverable from the corpus (See Hockett 1965 and Householder 1966 for similar ideas). Attempting to fit all of the many parameters of a complex grammar reduces the grammar’s predictiveness. So the true generating model, even if it existed, would likely to be bad at predicting future samples from the same community. Fortunately, we show that we do not have to select the single best grammar to achieve the E-Language goal. Instead we infer a whole ensemble of plausible grammars and make predictions by weighting each grammar by its predictiveness. We note that, in contrast to selecting the single best model, this approach properly takes into account model selection uncertainty. Because we do not have to select the best, or true model, we also don’t need as much data as in traditional model selection approaches, which makes this a promising approach for underdocumented languages. We also argue that inferring an ensemble of grammars that underlie the corpus is a better approach for approximating the I-Language goal: because language is redundant, the same behavioral output (and hence corpus data) can be achieved by many different internal grammars. The method is applied to data on contraction in English auxiliaries.

See also

Burnham & Anderson (2002), who developed multimodel inference in the information-theoretic framework employed here

Kuperman & Bresnan (2012), which introduced multimodel inference to the field

The MuMIn package in R (Barton 2013) to do multimodel inference.

Some related ideas are discussed in Divjak & Arppe (2013), Baayen, Janda, Nesset, Dickey & Makarova (2013) and Tagliamonte & Baayen (2012).

An application to assessing predictor importance is here: Kapatsinski, V. (2013). Towards a de-ranged study of variation: Estimating predictor importance with multimodel inference. In preparation.

Barth, D., & V. Kapatsinski. (2012). Evaluating logistic mixed-effects models of corpus data. Under review.

What’s in it: Discusses some issues in evaluating how well a mixed-effects model fits a dataset. Argues for cross-validating the model on a new dataset. Uses Monte Carlo simulations to show that the standard approach at the time (C Score on fit data, as in Baayen 2008) fails to discover the difference between a model with a true predictor and a model in which the values of that predictor have been randomly scrambled.

See also:

Nakagawa & Shielzeth (2013) for a simpler approach within MuMIn (Pointed out to me by Dan Johnson).

For mixed-effects models in linguistics, see e.g. Baayen 2008, Bresnan, Cueni, Nikitina & Baayen (2007), Dixon (2008), and other papers in the same special issue of JML, Johnson (2009), and Barr, Levy, Scheepers & Tily (2013)

For comparisons between regression-based (as in this paper) and tree-based approaches, see Baayen, Janda, Nesset, Dickey & Makarova (2013), Tagliamonte & Baayen (2012), Kapatsinski (2013a) and, from an explanatory adequacy perspective, Kapatsinski (2013b)

Kapatsinski, V. (2014). What is grammar like? A usage-based constructionist perspective. Linguistic Issues in Language Technology.

What’s in it: Some general characteristics of what I believe a plausible statistical/computational model of grammar should have. (Non-parametric, sensitive to complex non-crossover interactions, not committed to a single model for every speaker, etc.) See above for more information. Lots of references inside.

Kapatsinski, V. (2014). Sound change and hierarchical inference. What is being inferred? Effects of words, phones and frequency. Under review.

What’s in it: A view of sound change as involving articulatory reduction (happens every time a word is used) and hierarchical statistical inference (reduction is attributed to both words and sounds). A preliminary test of the idea with American English flapping is reported (follow-up trying to make learners to believe novel words to be slang and thus prone to reduction is ongoing).

Kapatsinski, V. (2010). What is it I am writing? Lexical frequency effects in spelling Russian prefixes: Uncertainty and competition in an apparently regular system. Corpus Linguistics and Linguistic Theory, 6(2), 157-215.

What’s in it: For the purposes of this section, an example of Monte Carlo simulations to assess the significance of a negative correlation between word frequency and error rate on a word (errors/word frequency). See below for the linguistic results.

Kapatsinski, V. (2008). Principal components of sound systems: An exercise in multivariate statistical typology. Indiana University Linguistics Club Working Papers Online, 08-08.

What’s in it: An application of principal components analysis to uncover what information about language relatedness is available in phoneme inventories.

Kapatsinski, V. (2006). Sound similarity relations in the mental lexicon: Modeling the lexicon as a complex network. Research on Spoken Language Processing Progress Report No.27, 133-152. Indiana University Speech Research Lab.

What’s in it: Assessing the large-scale structure of the phonological wordform lexicon under different definitions of what it means for two words to be neighbors of each other (similar enough to compete during word recognition). Shows that normalizing similarity by word length improves predictiveness of neighborhood density for word recognition times in a megastudy of English lexical decision and naming (the English Lexicon Project, Balota et al. 2007). More broadly speaking, argues that words become more similar if they share a lot, rather than only becoming less similar if they differ by much (existing measures of similarity take into account only how much two words differ, e.g. Luce & Pisoni 1998).

For other megastudies on the mental lexicon, see the work being done at Ghent by Brysbaert and colleagues: http://crr.ugent.be/programs-data/lexicon-projects and http://crr.ugent.be/programs-data/subtitle-frequencies and http://crr.ugent.be/archives/1602

For measures of phonological similarity, see the section below

For other examples of complex network analysis applied to the phonological wordform lexicon, see Altieri, Gruenenfelder & Pisoni (2010), Arbesman, Strogatz & Vitevitch (2010), Chan & Vietvitch (2010), Gruenenfelder & Pisoni (2006), Vitevitch (2008), Vitevitch, Chan & Goldstein (2014), Vitevitch, Chan & Roodenrys (2012) My general concern regarding this literature (and the reason I have not pursued complex network analyses beyond the 2006 paper) is that ‘phonological neighbor’ is a very ill-defined notion. We are unlikely to ever find a task-independent measure of between-word phonological similarity that would work across perception, production, recall etc. Task-specific measures are in their infancy. Furthermore, there would likely be no cutoff value of similarity, such that words that are less similar than that value do not interact during word recognition, production or recall. It therefore appears problematic to model lexicon structure by choosing a cutoff on some arbitrary measure of similarity and assuming that words that are more similar than that value have links between them, and words that are less similar do not.

For applications of complex networks to the semantic lexicon, see Steyvers & Tennenbaum (2005), and the large body of work resulting from that paper I find Gruenenfelder & Pisoni (2009) and Jones & Gruenenfelder (2011) particularly interesting in trying to address the question of what it means for two words to be semantically similar.

For a general review of the landscape of studies on network structures in language, see Choudhury & Mukherjee (2009)

Constituency, chunking, the units of language

Generally speaking, this line of work argues for there being local (node-like) representations for many frequent yet compositional linguistic units. As noted by Langacker (1987), this does not mean that there are not also smaller units within these. There are many routes from form to meaning or from meaning to form (Baayen, Dijkstra & Schreuder 1997). You may take a route going through one set of units on one occasion, and a route going through another set of units on another. Having a node allocated to a unit allows that unit to acquire associations that its parts do not have (Bybee 2006) and to compete with other units for selection or at least association strength (Kapatsinski 2007, Oppenheim, Dell & Schwarz 2010).

Kapatsinski, V., & J. Radicke. (2009). Frequency and the emergence of prefabs: Evidence from monitoring. In R. Corrigan, E. Moravcsik, H. Ouali, & K. Wheatley, eds. Formulaic Language. Vol. II: Acquisition, loss, psychological reality, functional explanations, 499-520. Amsterdam: John Benjamins. (Typological Studies in Language 83).

What’s in it:

We argue that, as long as the whole and its parts are both lexical (linked to meaning), the whole competes with its parts for recognition. (See also Bybee & Brewer 1980, Hay 2001, Healy 1976, 1994, Sosa & MacFarlane 2002. For the opposing view, see McClelland & Rumelhart 1981; Rumelhart & McClelland 1982).

We use a monitoring task, in which participants monitor spoken sentences for occurrence of /Λp/ (up), whether it is a word, a morpheme, or just a sequence of sounds inside a word. The subjects press a button as soon as they hear /Λp/. In this, we follow Sosa & MacFarlane 2002, who monitored for of but only when it was a separate word.

We find that /Λp/ (up), is easier to detect (in terms of reaction time) when it is a morpheme than when it is not, and when it is a syllabic constituent (rime) than when it is not. This supports the notion that unithood makes the unit easier to detect, as well as providing evidence for rimes as units in English. (See also Kapatsinski 2009, Lee & Goldrick 2008, Olejarczuk & Kapatsinski 2013 and Treiman & Danis 1988). The influence of syllable structure on monitoring was not observed for English by Cutler, Mehler, Norris & Segui (1986) and Bradley, Sanchez-Casas & Garcia-Albea (1993), which led to claims that syllables (or rimes) are not perceptual units in English (e.g. Cutler, McQueen, Norris & Somejuan 2001). We argue that these previous results are due to the fact that stimuli deemed to have a CV.CVC structure had a liquid intervocalic consonant in these previous studies, along with a lax first-syllable vowel and first-syllable stress. With such words, the intervocalic consonant is usually parsed into the first syllable (Derwing 1992), so the structure is actually CVC.VC. Detecting a CVC in CVC.VC is no harder than in CVC.CVC. In our stimuli, the intervocalic consonant /p/ is a stop, which is parsed into the second syllable (as in upon). This makes up hard to detect in such words. See also Morgan & Wheeldon (2003). Note that we do not claim that the rime (or the syllable) is the unit of perception. There is no single unit of perception in a multiroute model. We just claim that it is a unit you sometimes use in perception.

More interestingly, there was a robust effect of word frequency: /Λp/ was harder to detect in high-frequency words. The effect was monotonic: it was observable throughout the frequency range. As word frequency increased, /Λp/ became harder and harder to detect. We take this as evidence that words were competing for recognition with up, with frequent words being stronger competitors than rare words (Bybee & Brewer 1980, Hay 2001, Healy 1976, 1994). I suspect that this is because /Λp/ is a meaningful unit in the lexicon, and if we were to pick a meaningless phoneme we would not find this, but this is untested, as far as I know.

We did not find this inhibitory effect of frequency for verb+up phrases, except, perhaps, for the very top of the frequency range. In fact, we found a facilitatory effect of phrase frequency throughout most of the range: up was easier to detect in moderately frequent bigrams like walk up than in rare ones like eke up. This is expected if up is predicted on the basis of the preceding context. The lack of an inhibitory frequency effect throughout most of the phrase frequency range suggests that phrases do not compete with their parts for recognition, unless they are super frequent, and so may not be stored as units except in case of very high frequency. The difference in unithood between words and phrases is also supported by the observation (Harmon & Kapatsinski 2014) that, following a disfluency, production never restarts from a word-internal position (I had a similar, uh, similar health plan. vs. *I had a similar, uh, -milar health plan. or even *It is darkest, uh, -est before sunrise.) Production does sometimes restart from phrase internal position, even for very frequent phrases (This is a kind of, uh, of a health insurance plan. It came up, uh, up during the meeting).

The conclusion that phrases are not stored as units unless very frequent is debated by Cappelle, Shtyrov & Pulvermüller (2010) based on finding that the MMN ERP response to up is reduced in frequent verb+up phrases (that are not as frequent as our super-frequent phrases). Given the differences in methodology, it is not clear how to interpret this discrepancy. For example, we have a large continuous range of frequencies but do not include ungrammatical word combinations. Cappelle et al. test only two verbs (heat up and cool down) and contrast them with ungrammatical combinations (heat down and cool up). One interpretation of the difference is that heat up and cool down activate richer semantic networks (as Cappelle et al. suggest) than does ungrammatical word salad; then the same result would be obtained by contrasting grammatical and ungrammatical uses of, say, eke up. In this case, the results do not speak directly to the question of whether cool down and heat up are units. To address this question, frequent and infrequent interpretable particle verbs need to be contrasted, as in Tremblay et al. (2014)., where Generalized Additive Mixed Models are used to look for non-linearities in the effect of phrase frequency on EEG signals. The results appear to be consistent with our position. An alternative interpretation is that there is an early effect of storage (detected by Cappelle et al.) that is then overridden by the effect of predictability (which they claim to be postlexical). Cappelle et al. suggest that the MMN at the phrase level seems to not be sensitive to between-word probabilities. Monitoring latency certainly is. However, for me, it does not make much sense for the effect of predictability to be a postlexical effect if it is to be helpful for everyday word recognition in context.

We did find that up was harder to detect in the most frequent verb-particle combinations than in medium-frequency ones. We interpreted this result as indicating that the highest-frequency phrases like come up are stored as units and compete with their parts for recognition. We should caution, though, that there are very few such units (Zipf’s Law strikes again). Thus, other causes of the differences having to do with idiosyncracies of semantics of phonology cannot be ruled out with great certainty without replications. Our paper itself is a follow-up on Sosa & MacFarlane 2002 where of was found to be harder to detect in frequent phrases like kind of but the phonological confounds are even greater there. Some encouraging results were, however, recently reported by Tremblay et al. (2014).

Harmon, Z., & V. Kapatsinski. (2014). Determinants of lengths of repetition disfluencies: Probabilistic syntactic constituency in speech production. Chicago Linguistic Society 50.

What’s in it: Examines repetition disfluencies as a window on between-word cohesion and constituency. We argue that in producing sequences like I had a similar- + a similar health plan speakers tend not to restart production from inside a cohesive unit. For example, speakers never restart from inside a word, as in *I had a similar + -milar health plan. Two influences on cohesion above the word level are syntactic constituency (Levelt 1983) and co-occurrence. Speakers tend to restart from the nearest major constituent boundary, unless it’s high in backwards transitional probability. Note, however, that even the most cohesive word sequences are not as cohesive as the least cohesive words: you can restart from the beginning of lot or of in the very frequent a lot of but not from the beginning of -ness in the infrequent word shallowness. This argues that prefabs are not quite ‘big words’ (as also argued in Kapatsinski & Radicke 2009).

See also:

Kapatsinski, V. (2005). Measuring the relationship of structure to use: Determinants of the extent of recycle in repetition repair. Berkeley Linguistics Society 30, 481-492.

Kapatsinski, V. (2010). Frequency of use leads to automaticity of production: Evidence from repair in conversation. Language and Speech, 53(1), 71-105.

What’s in it: Examines replacement repairs, as in I used to listen to the newsp- + the radio in the morning. Documents that the likelihood of interrupting a to-be-replaced word (newspaper in this case) is related to the frequency of the word. Rare words are more likely to be interrupted than frequent words, even controlling for the fact that frequent words tend to be shorter. This provides evidence for the hypothesis that words are units of speech execution: the more frequent a word, the more cohesive it is, and the more automatized and ‘ballistic’ its production, making it harder to interrupt. Note that, unlike the categorical restriction on restarting from the middle of a word, the tendency not to stop the production of a word before its complete is gradient. I would argue that this is because words consist of smaller units that can be monitored and suppressed (accounting for the results of Tilsen & Goldstein 2012) but that nonetheless the word forms a whole (contra Tilsen & Goldstein 2012), so production always starts from some word boundary.

For experimental work documenting frequency effects on how difficult it is to stop production of a word, see Logan (1982)

For other work on repair, disfluencies and cohesion, see Goldman-Eisler (1957), Clark & Wasow (1998), Levelt 1983, Maclay & Osgood (1959), Plug & Carter (2013), Schnadt (2009), Schneider (2014) and Tannenbaum et al. (1965)

Kapatsinski, V. (2014). Sound change and hierarchical inference. What is being inferred? Effects of words, phones and frequency. Under review.

What’s in it: For the purposes of this section, argues that both words and ‘sounds’ (sublexical structures, be they phonemes or gestures) are units of sound change. ‘Blame’ for a pronunciation of a sound can be split between the word, the sound and contextual factors in a principled way using hierarchical inference.

Kapatsinski, V. (2009). Testing theories of linguistic constituency with configural learning: The case of the English syllable. Language, 85(2), 248-277.

See also:

Kapatsinski, V. (2008). Constituents can exhibit partial overlap: Experimental evidence for an exemplar approach to the mental lexicon. In R. L. Edwards, P. J. Midtlyng, C. L. Sprague, and K. G. Stensrud, eds. CLS 41: The Panels, 227-242. Chicago: Chicago Linguistic Society. What’s in it: Argues that both rimes and bodies of Russian verb roots are associate with suffixes based on a wug test.

Olejarczuk, P., & V. Kapatsinski. (2013). The syllabification of medial clusters: evidence from stress assignment. Poster presented at Linguistic Society of America Annual Meeting, Boston, MA.

What;’s in it: Documents the relationship between stress and syllable structure in English: competing tendencies to stress the initial syllable vs. to stress the heavy penultimate syllable, in proportion to the type frequencies of the competing patterns in the lexicon. Current work is examining how these baseline frequencies of stressing initial light vs. penultimate heavy syllables can be affected by additional exposure to the competing patterns.

See also:

For related work on stress placement, see Guion et al. (2003), Domahs, Plag & Carroll (2014) and Ryan (2014)

For some recent work arguing for or against probability matching in extending linguistic patterns, Becker, Ketrez & Nevins (2011), Ernestus & Baayen 2003, Hayes, Siptar, Zuraw & Londe 2009, Kapatsinski 2010

For the role of sonority in dealing with novel clusters, see Albright 2007, Berent, Steriade, Lennertz & Vaknin 2007, Daland et al. 2011, Hayes 2011, Redford 2008

Sound change and lexical diffusion

Kapatsinski, V. (2014). Sound change and hierarchical inference. What is being inferred? Effects of words, phones and frequency. In revision.

What’s in it: Argues that pronunciation of a phone(me) is attributed both to the phone(me) and the word it is in using hierarchical inference. Models articulatorily-motivated sound change with this assumption in mind. Shows that — with the assumption that words are units of production that are subject to automatization (not necessarily the only units of production) and the assumption of hierarchical inference — exceptionally unreduced words should be of medium, rather than very low, frequency. Tests this prediction with American English flapping, with somewhat discouraging results for the model so far: novel words (super low frequency) appear to be reduced as little as medium-frequency words that tend to occur in formal contexts. I hypothesize that the reason is that college students assume novel words to be academic words, so they assume novel words should behave like words that tend to occur in formal contexts. Current work attempts to put the words into highly informal contexts where they should instead be perceived as slang and therefore reduced (i.e. flapped).

Lee, O., & V. Kapatsinski. (2014). Frequency effects in morphologization of Korean /n/-epenthesis. In revision.

What’s in it: Examines the disappearance of /n/ epenthesis in Korean. Documents that /n/ epenthesis is more likely to happen in frequent compounds than in rare compounds, and is categorically banned from Sino-Korean words and morpheme boundaries in non-compounds. Within the native kroean compounds, certain second morphemes are associated with /n/ epenthesis. Argues that /n/ epenthesis is morphologized (triggered by specific morphemes) and is losing productivity (hence it is more common in frequent words that can be retrieved from the lexicon directly rather than composed using the grammar). The specific morphemes associated with /n/ epenthesis are argued to be good triggers because they tend to occur in native compounds, an environment favoring /n/ epenthesis.

Kapatsinski, V. (2010). Frequency of use leads to automaticity of production: Evidence from repair in conversation. Language and Speech, 53(1), 71-105.

What’s in it: Provides support for the proposal that words are units of production (more specifically, execution), and that frequent words are produced more automatically. This assumption underlies Bybee’s (2002) explanation for why articulatorily-motivated (reductive) sound change starts in high-frequency words.

See also: Kapatsinski, V. (2007). Does high frequency lead to automaticity? A corpus study. Poster presented at the Workshop on Variation, Gradience and Frequency in Phonology, Stanford, CA, July 6-8. What’s in it: Data from repetition disfluencies (rather than replacement repair, as in the main paper above).

Frequency, familiarity, repetition

Vajrabhaya, P., & V. Kapatsinski. (2014). First time’s the charm: First-mention lengthening as an automated act. In revision.

What’s in it: Words are longer when mentioned for the first time within a story, even if that story has just been told to the same listener. We argue that this behavior is conventionalized as the way to tell a story.

See also:

Vajrabhaya, P., & E. Pederson. (In prep) tor evidence that this is NOT what happens with gestures: they continue to reduce with repetition.

Vajrabhaya, P., & V. Kapatsinski. (2011). There is more to the story: First-mention lengthening in Thai interactive discourse. Proceedings of the 17th International Congress of Phonetic Sciences, 2050-2053. What’s in it: Same finding in Thai

Kapatsinski, V., & R. Janda. (2011). It’s around here: Residential history and the meaning of ‘Midwest’. Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2983-2988.

What’s in it: Where is the Midwest? It depends where you are from. For Midwesterners, it’s around where they’ve lived: the Midwest appears to be anchored in the familiar exemplars of Midwestern locations. For outsiders, it’s a mix: some think it’s the middle of the west, some go on an official definition. For more data on this, see http://fivethirtyeight.com/datalab/what-states-are-in-the-midwest/, and http://fivethirtyeight.com/datalab/more-data-analysts-went-looking-for-the-south-and-midwest-and-heres-what-they-found/

What’s in it: Whole-word frequency effects are found for a regular spelling rule, but only if the word’s spelling is difficult because the word has a differently-spelled morphological relative.

Kapatsinski, V. (2010). Frequency of use leads to automaticity of production: Evidence from repair in conversation. Language and Speech, 53(1), 71-105.

What’s in it: Word frequency is argued to result in automatized production of the word: even when the speaker intends to stop production, in order to replace the word with another, stopping is delayed when the word is frequent.

Kapatsinski, V. (2007). Frequency, neighborhood density, age-of-acquisition, lexicon size, and speed of processing: Towards a domain-general, single-mechanism account. In S. Buescher, K. Holley, E. Ashworth, C. Beckner, B. Jones, and C. Shank. Proceedings of the 6^th Annual High Desert Linguistics Society Conference, 121-40. Albuquerque, NM: High Desert Linguistics Society.

What’s in it: Links together findings on priming, word recognition and associative learning, to argue that frequency, density, AoA and lexicon size effects can be captured by a simple model in which there are type and token nodes, all types are connected to each other, and links compete for a limited supply of spreading activation.

See also:

Kapatsinski, V. (2006). Towards a single-mechanism account of frequency effects. The LACUS Forum 32: Networks, 325-335. (A shorter version).

Kapatsinski, V. (2005). LAST: A single-mechanism account of type and token frequency effects and their relatives. Speech Research Laboratory Lab Meeting Talk, Bloomington, IN. (In presentation format)

Kapatsinski, V. (2006). Having something common in common is not the same as sharing something special: Evidence from sound similarity judgments. Paper presented at the LSA Annual Meeting, Albuquerque, NM.

What’s in it: Argues that perceived similarity is increased when the shared part is something rare. Discusses implications for models of lexical representation.

Similarity

Teruya, H., & V. Kapatsinski. (2012). Sharing the beginning vs. the end: Spoken word recognition in the visual world paradigm in Japanese. Paper presented at the Linguistic Society of America Annual Meeting, Portland, OR.

Kapatsinski, V. (2004). Phonological similarity relations: Network organization of the lexicon and phonology, VIII Encuentro Internacional de Linguistica en el Noroeste, Hermosillo, Sonora, Mexico (Published in the proceedings of Encuentro).

Language Variation & Change

Barth, D., & V. Kapatsinski. (2014). A multimodel inference approach to categorical variant choice: construction, priming and frequency effects on the choice between full and contracted forms of am, are and is. Corpus Linguistics & Linguistic Theory. Please e-mail for offprints.

Barth, D., & V. Kapatsinski. (2012). Evaluating logistic mixed-effects models of corpus data. Under review.

Kapatsinski, V. (2014). Sound change and hierarchical inference. What is being inferred? Effects of words, phones and frequency. Under review.

Lee, O., & V. Kapatsinski. (2014). Frequency effects in morphologization of Korean /n/-epenthesis. Under review.

Kapatsinski, V. (2013). Towards a de-ranged study of variation: Estimating predictor importance with multimodel inference. In preparation.

Kapatsinski, V., & C. M. Vakareliyska. (2013). [N[N]] compounds in Russian: A growing family of constructions. Constructions & Frames, 5(1), 69-87.

Jing-Schmidt, Z., & V. Kapatsinski. (2012). The apprehensive: Fear as endophoric evidence and its pragmatics in English, Mandarin, and Russian. Journal of Pragmatics, 44(4), 346-373.

Kapatsinski, V. (2009). Adversative conjunction choice in Russian: Semantic and syntactic influences on lexical selection. Language Variation and Change, 21(2), 157-173.

Usage-based Linguistics Laboratory

This annotated list is getting a little out of date

For a full chronological list, see here

General theory

Morphophonology, productivity, sublexical constructions

Statistical learning, perceptual learning, inductive bias

Statistical Methods

Constituency, chunking, the units of language

Sound change and lexical diffusion

Frequency, familiarity, repetition

Similarity

Language Variation & Change

Member Login