This annotated list is regularly getting out of date

For a full chronological list, see here

General theory

Kapatsinski, V. (2018). Changing minds changing tools: From learning theory to language acquisition to language change. Cambridge, MA: MIT Press.

Cover blurb: In this book, Vsevolod Kapatsinski argues that language acquisition—often approached as an isolated domain, subject to its own laws and mechanisms—is simply learning, subject to the same laws as learning in other domains and well described by associative models. Synthesizing research in domain-general learning theory as it relates to language acquisition, Kapatsinski argues that the way minds change as a result of experience can help explain how languages change over time and can predict the likely directions of language change—which in turn predicts what kinds of structures we find in the languages of the world. What we know about how we learn (the core question of learning theory) can help us understand why languages are the way they are (the core question of theoretical linguistics).

Taking a dynamic, usage-based perspective, Kapatsinski focuses on diachronic universals, recurrent pathways of language change, rather than synchronic universals, properties that all languages share. Topics include associative approaches to learning and the neural implementation of the proposed mechanisms; selective attention; units of language; a comparison of associative and Bayesian approaches to learning; representation in the mind of visual and auditory experience; the production of new words and new forms of words; and automatization of repeated action sequences. This approach brings us closer to understanding why languages are the way they are than approaches premised on innate knowledge of language universals and the language acquisition device.

Kapatsinski, V. (2014). What is grammar like? A usage-based constructionist perspectiveLinguistic Issues in Language Technology.

Main points: This is a review paper arguing the following points. You need grammar (generalization) and not just exemplars. Grammar includes constructions (form-meaning pairings) but also syntagmatic form-form associations and paradigmatic construction-construction associations. Grammar is non-parametric: there is no limit to the number of constructions in a language. Grammar is partially redundant: Any feature is predictable from many other features (see also Winter 2014). There are many routes to get from a form to a meaning or from a meaning to a form that are used in parallel (see esp. Beekhuizen, Bod & Zuidema 2013 as well as Kapatsinski 2010). There is no one way to produce something in language. Different people might do the same thing in different ways (behavioral uniformity together with representational diversity). There is no single mental grammar shared by all members of a community. Multimodel inference helps infer the ensemble of grammars that might underlie the observed behavior (see also Barth & Kapatsinski 2014).

See also:

Some book-length related worksBarðdal (2008), Bybee (2001, 1985), Dabrowska (2004), Goldberg (2006), Langacker (1987), Nesset (2008)

For multimodel inference:  Barth, D., & V. Kapatsinski. (2014). A multimodel inference approach to categorical variant choice: construction, priming and frequency effects on the choice between full and contracted forms of am, are and isCorpus Linguistics & Linguistic Theory. And also Kuperman & Bresnan (2012).

There is related work on categorization that I was not aware of at the time of writing, which is summarized here (see pp.23-25).

Kapatsinski, V. (2023). Defragmenting learningCognitive Science, 47(6), e13301. Unformatted version

Main point: We need to bring the sciences of learning back together, since we keep rediscovering the wheel

 

Learning Theory

Work in this vein tests alternative domain-general learning algorithms by designing experiments on language learning for which they make different predictions about what should be learned. Much of this work uses the Rescorla-Wagner model / delta rule — often called the ‘standard theory’ of error-driven learning — as one of the contender algorithms.

Discoveries in learning theory:

Kapatsinski, V. (2023). Learning fast while avoiding spurious excitement and overcoming cue competition requires setting unachievable goals: Reasons for using the logistic activation function in learning to predict categorical outcomesLanguage, Cognition & Neuroscience, 38(4), 575-596.

What’s in it: Argues that outcomes are (often) truly categorical, having a probability but not a magnitude, based on finding that form-meaning mappings are learned more accurately with a logistic perceptron (which treats outcomes as categorical) rather than with the Rescorla-Wagner model (which treats them as continuous). Shows that there are contingency structures for which the Rescorla-Wagner model is guaranteed to learn an association between things that never co-occur, and that the logistic perceptron does not (and neither do humans) — the Spurious Excitement Effect. Also shows that the logistic perceptron has the advantages of 1) predicting that cue competition effects are transient, 2) accounting for the learning never stopping, and 3) accounting for apparent ‘aha moments’ without a causal ‘aha’.

See also: Caballero & Kapatsinski (2022), which shows the Spurious Excitement Effect in Rescorla-Wagner learning form-meaning mappings in the morphology of a natural language, showing that the results above generalize to natural languages.

Mujezinović, E., Kapatsinski, V., & van de Vijver, R. (2024). Learning to unlearn: The role of negative evidence in morphophonological learningCognitive Science, 48(5), e13450.

Main point: Presents experimental data showing that weakening a form-meaning association between a present cue and an absent outcome is accompanied by strengthening associations between absent cues and the same outcome.

Harmon, Z., Idemaru, K., & Kapatsinski, V. (2019). Learning mechanisms in cue reweightingCognition, 189, 76-88.

What’s in it: Shows that the Recscorla-Wagner / delta rule algorithm predicts that a phonetic cue will always be downweighted if it is no longer reliable. Shows that humans only downweight the cue if there is another cue that they can rely on instead. Develops an alternative model of phonetic learning based on selective attention theory and reinforcement learning: learners reallocate attention across phonetic dimensions if doing so would improve prediction accuracy, by maintaining and comparing alternative attention allocation policies.

Kapatsinski, V., Bramlett, A. A., & Idemaru, K. (2024). What do you learn from a single cue? Dimensional reweighting and cue reassociation from experience with a newly unreliable phonetic cue. Cognition, 249, 105818.

What’s in it: Verifies experimentally the assumption of the Harmon et al. model that attention is reallocated across phonetic dimensions rather than individual cues, but also shows that dimensional reweighting (which occurs only when it would improve accuracy, as proposed in Harmon et al.) coexists with reassociation of cues with unexpected outcomes, as proposed by Rescorla & Wagner. Extends the Harmon et al. (2019) model to allow combining attention reallocation and dimensional reweighting mechanisms and learning from continuous acoustic signals.

Olejarczuk, P., Kapatsinski, V., & Baayen, R. H. (2018). Distributional learning is error-driven: The role of surprise in the acquisition of phonetic categoriesLinguistics Vanguard, 4(S2).

What’s in it: Argues that surprising tokens have more effect on a phonetic category representation than unsurprising ones, leading to non-veridical mental representations of skewed distributions over phonetic space. Shows that this falls out from error-driven learning models like Rescorla-Wagner.

Smolek, A., & Kapatsinski, V. (2023). Syntagmatic paradigms: Learning correspondence from contiguityMorphology, 33, 287-334. https://link.springer.com/article/10.1007/s11525-023-09411-w

What’s in it: Presents experimental data showing that temporal adjacency helps learn both the ways in which paradigmatically related words (like singulars and plurals) differ, and what they share. Rescorla-Wagner does a good job predicting the effect and  suggests that it is due to the greater salience of form outcomes that represent relations between the product form from the source form. Since we know that morphologically related words do tend to co-occur in close temporal proximity (Xu & Croft, 1998), this may be what allows speakers to learn paradigmatic relations between words.

See also:  Carvalho & Goldstone (2014), showing a similar effect (though apparently limited to differences) in visual category learning; Copot & Bonami (2023) showing learning of paradigms;

Views and reviews on learning theory:

Kapatsinski, V. (2018). Changing minds changing tools: From learning theory to language acquisition to language change.* Cambridge, MA: MIT Press. Table of Contents

Kapatsinski, V. (2023). Defragmenting learningCognitive Science, 47(6), e13301. Unformatted version

Kapatsinski, V. (2024). Associations, chunks, hierarchies, attention, and analogy: What do we need? Topics in Cognitive Psychologyl’Année Psychologique, 142, 223-228.

 

Language Production

Work in this vein aims to develop a connectionist theory of language production that tries to get as far as possible with activation, inhibition and associative learning.

Harmon, Z., & V. Kapatsinski. (2021). A theory of repetition and retrieval in language production. Psychological Review, 128(6), 1112-1144.

What’s in it: A theory of lexical access in production within a sentential context. The basic idea is that speakers access the next word to produce using both top-down semantic cues and preceding context, and that these sources of information cooperate in processing but compete in learning (see also Harmon & Kapatsinski, 2020 for the learning part). Explains why some languages have repetition disfluencies, where they occur, and where people tend to restart production from.

Kapatsinski, V. (2022) Morphology in a parallel, distributed, interactive architecture of language production. Frontiers in Artificial Intelligence, 5. (Part of “Cross-disciplinary Perspectives on Language Architecture“, ed. by M. M. Pinango, A. Smirnova, P. B. Schumacher & R. Jackendoff.)

What’s in it: Spells out a connectionist account of morphological production for both familiar and novel wordforms. Argues that production involves an initial phase of activation spreading from distributed semantics to form chunks, with form activation = semantic overlap with the intended message x form frequency, and a subsequent phase in which feedback from form to meaning suppresses activated forms that activate unintended meanings (the Negative Feedback Cycle). The NFC explains how speakers can avoid overextension when this is likely to have unintended consequences.

See also Kapatsinski, V. (2024) Creativity through inhibition (of the first production that comes to mind). What’s in it: Develops the NFC theory, showing how it can account for various morphophonological innovations, and uses these examples to propose what it means for a speaker to be creative.

Harmon, Z., & Kapatsinski, V. (2017). Putting old tools to new uses: The role of form accessibility in semantic extensionCognitive Psychology98, 22-44.

What’s in it: Shows that a word is extended to a new meaning or contexts because the new context / meaning shares features with contexts and meanings with which the word has co-occurred, and the word is activated by the shared semantic features more than any other form. If you make frequent and infrequent words equally activated, the preference to extend frequent forms to novel meanings disappears. (See also Kapatsinski & Harmon, 2017, for a demonstration that these results do not speak to learning theory, i.e., are expected under any view of learning.) This paper makes it clear that overextension is almost inevitable unless the NFC intervenes.

See also: Kapatsinski, V. (2010). What is it I am writing? Lexical frequency effects in spelling Russian prefixes: Uncertainty and competition in an apparently regular system. Corpus Linguistics and Linguistic Theory, 6(2), 157-215. Kapatsinski, V. (2010). Velar palatalization in Russian and artificial grammar: Constraints on models of morphophonology. Journal of Laboratory Phonology, 1(2), 361-393. What’s in them: Show that regular patterns nonetheless compete with more general patterns in production, i.e., production probability is a function of match to context/message x frequency. The better-matching constructions are preferred over more general ones because they are a better match, but the more general one still often win the competition due to their greater frequency.

Kapatsinski, V. (2021). What are constructions, and what else is out there? An associationist perspectiveFrontiers in Communication, 5, 134. (In Putnam, M., Carlson, M., Fábregas, A., Wittenberg, E., eds. Defining Construction: Insights Into the Emergence and Generation of Linguistic Representations (pp.75-89). Lausanne: Frontiers Media SA . doi: 10.3389/978-2-88971-935-8)

What’s in it: Argues that the experimental evidence so far is consistent with the idea form-meaning mappings are bidirectional (contra Ramscar et al., 2010), but that form-meaning mappings are not all there is to morphology.  In particular, argues (following Kapatsinski 2013, 2017) that we need copying of chunks from activated forms and paradigmatic mappings, and exemplifies how these processes work together in Russian bez- adjective formation (relying in part on data examined in this paper: 2010). Shows that some but not all paradigmatic mappings may be dispensed with if negative form-meaning associations are allowed.

Kapatsinski, V. (2007). Frequency, neighborhood density, age-of-acquisition, lexicon size, and speed of processing: Towards a domain-general, single-mechanism account. In S. Buescher, K. Holley, E. Ashworth, C. Beckner, B. Jones, and C. Shank. Proceedings of the 6th Annual High Desert Linguistics Society Conference, 121-40. Albuquerque, NM: High Desert Linguistics Society.

What’s in it: Linked together findings on priming, word  recognition and associative learning, to argue that frequency, density, AoA and lexicon size effects can be captured by a simple model in which there  are type and token nodes, all types are connected to each other, and links compete for a limited supply of spreading activation.

What about it endures: Competition for the limited supply of activation allowed the model to explain results that in other spreading activation models are due to mutual inhibition and is probably the better account because it does not require learning a huge set of inhibitory connections for every new word. Norris & McQueen’s 2008 Bayesian account (the activation is in limited supply because it’s actually probability) is more constrained and arguably superior, but there are arguments for activation over probability. Specifically, Harmon (2019) points out that activation-based models predicts the fact that probabilities are undermatched (more random than they should be) in a forced-choice task (e.g., Harmon & Kapatsinski, 2017; Olejarczuk, 2018; Smolek 2019), because such a task makes alternatives approximately equally activated. It is difficult to see how this would be accounted for in a Bayesian, probabilistic model.

The draining of activation from a type into tokens was the initial account of the inverse frequency effect in priming (taken up and tested for syntactic priming in Snider 2008 before it was reanalyzed as evidence for error-driven learning in Jaeger & Snider 2013). Error-driven learning is probably the better account.

See also:

Kapatsinski, V. (2006). Towards a single-mechanism account of frequency effects. The LACUS Forum 32: Networks, 325-335. (A shorter version).

Kapatsinski, V. (2005). LAST: A single-mechanism account of type and token frequency effects and their relatives. Speech Research Laboratory Lab Meeting Talk, Bloomington, IN. (In presentation format)

 

Language Change

Work in this vein aims to explain recurrent pathways of language change. Why do languages change in the ways they do?

Kapatsinski, V. (2022) Morphology in a parallel, distributed, interactive architecture of language production. Frontiers in Artificial Intelligence, 5. (Part of “Cross-disciplinary Perspectives on Language Architecture“, ed. by M. M. Pinango, A. Smirnova, P. B. Schumacher & R. Jackendoff.)

What’s in it: Explains innovations seeding a number of rare changes like backformation and degrammaticalization as the result of the Negative Feedback Cycle (which is only active when the speaker has time to allow it to operate). See also Kapatsinski, V. Creativity through inhibition (of the first production that comes to mind) for pejoration, semantic narrowing, overabundance of hypocoristics, and homophony avoidance resulting in paradigm gaps.

Kapatsinski, V. (2021). Hierarchical inference in sound change: Words, sounds and frequency of useFrontiers in Psychology, 12. (Part of “Rational Approaches in Language Science“, ed. by M. W. Crocker, G. Jaeger, G. Kuperberg, E. Teich, & R. Turnbull, a joint research topic by Frontiers in Psychology and Frontiers in Communication.) https://doi.org/10.3389/fpsyg.2021.652664

What’s in it: Shows that hierarchical inference predicts the finding that analogical changes start in rare words. Shows that lexicalization of a reductive sound change emerges if learners try to optimally allocate credit for a pronunciation across levels of the linguistic hierarchy and end up misattributing some of the effect of frequency or context to lexical identity. Shows that the same change applied to the same lexicon will sometimes run to completion, sometimes become lexicalized, and sometimes sputter out, showing that actuation is unpredictable in principle. Shows that frequency effects in articulatorily-motivated sound changes should become U-shaped in later stages of the change unless speakers know that novel words behave like rare words. See also Kapatsinski, V. Lexical frequency and diffusion. In A. Ledgeway, E. Aldridge, A. Breitbarth, K. É. Kiss, J. Salmons andA. Simonenko (Eds.), The Wiley Blackwell Companion to Diachronic Linguistics. Wiley.

Harmon, Z., & Kapatsinski, V. (2017). Putting old tools to new uses: The role of form accessibility in semantic extensionCognitive Psychology98, 22-44.

What’s in it: Shows that frequent words are extended to new uses (e.g., in grammaticalization) because they are more accessible than rare words. Leveling accessibility differences between frequent and rare words removes the preference to extend frequent words to new meanings. Also shows that frequent words are not dissociated from their meanings (contra bleaching accounts of extension), but in fact associated with them more strongly than rare words. As a result, speakers extend frequent words to new uses despite being confident that these are the words they have not seen used that way. Kapatsinski & Harmon, 2017 shows that these results are virtually inevitable under any sensible view of learning. See also Kapatsinski, V. (2019). Accessibility as a driver of change. In K. Lee-Legg & J. C. Wamsley (Eds.), 50th Anniversary LCIU Commemorative Collection, 57-59. Bloomington, IN: Linguistics Club. for a short summary, and Harmon, Z. (2019). Accessibility, language production, and language change. Ph.D. Dissertation, University of Oregon. for other related effects (such as niche-seeking)

Smolek, A., & Kapatsinski, V. (2018). What happens to large changes? Saltation produces well-liked outputs that are hard to generateLaboratory Phonology9(1), 10.

What’s in it: Explains why large stem changes (e.g., saltatory ones) lose productivity over time — changing a preactivated form in production is hard. This is not so true in judgment — so forms resulting from a large change are judged as being better than forms without the change. So, if these forms are generated, they are likely to outcompete no-change forms, allowing the change to survive in forms that are stored in memory.

Kapatsinski, V. (2017) Learning a subtractive morphological system: Statistics and representations. In Maria LaMendola and Jennifer Scott (Eds.), Proceedings of the Boston University Conference on Language Development, 41(1), 357-372. Cascadilla Press.

What’s in it: Explains why subtraction tends to involve into truncation (because of product-oriented generalization).

Kapatsinski, V., & C. M. Vakareliyska. (2013). [N[N]] compounds in Russian: A growing family of constructions. Constructions & Frames, 5(1), 69-87.

What’s in it: Shows that compounding is expanding in Russian (and other Slavic languages) in a word-by-word fashion, and that productivity of compounding is influenced by both members of the compound.

Kapatsinski, V. (2010). Velar palatalization in Russian and artificial grammar: Constraints on models of morphophonology. Journal of Laboratory Phonology, 1(2), 361-393.

What’s in it: Explains how exceptionless rules can start gaining exceptions and lose productivity. Requires the exceptionless rule to be competing with a more general pattern, and for this competition to be resolved stochastically. Examines a case of this competition resulting in the emergence of exceptions to velar palatalization in Russian, and shows that the strength of the more general rule is the cause of the loss of productivity by manipulating it in an artificial language presented to English speakers.

 

Probabilistic morphophonology, productivity, sublexical constructions

Work in this vein aims to investigate the nature of the mental grammar, and how lexical patterns such as paradigmatic mappings or stress  are generalized and applied to novel inputs / words.

Mujezinović, E., Kapatsinski, V., & van de Vijver, R. (2024). Learning to unlearn: The role of negative evidence in morphophonological learningCognitive Science, 48(5), e13450.

What’s in it: Presents experimental data showing that weakening a form-meaning association between a present cue and an absent outcome is accompanied by strengthening associations between absent cues and the same outcome.

Kapatsinski, V. (2022) Morphology in a parallel, distributed, interactive architecture of language production. Frontiers in Artificial Intelligence, 5. (Part of “Cross-disciplinary Perspectives on Language Architecture“, ed. by M. M. Pinango, A. Smirnova, P. B. Schumacher & R. Jackendoff.)

What’s in it: Spells out a connectionist account of morphological production. Argues that production involves an initial phase of activation spreading from distributed semantics to form chunks, and a subsequent phase in which feedback from form to meaning suppresses activated forms that activate unintended meanings (the Negative Feedback Cycle). The NFC explains how speakers can avoid overextension when this is likely to have unintended consequences.

Kapatsinski, V. (2024). Creativity through inhibition (of the first production that comes to mind).

What’s in it: Develops the NFC theory, showing how it can account for various morphophonological innovations, and uses these examples to propose what it means for a speaker to be creative.

Kapatsinski, V. (2021). What are constructions, and what else is out there? An associationist perspectiveFrontiers in Communication, 5, 134. (In Putnam, M., Carlson, M., Fábregas, A., Wittenberg, E., eds. Defining Construction: Insights Into the Emergence and Generation of Linguistic Representations (pp.75-89). Lausanne: Frontiers Media SA . doi: 10.3389/978-2-88971-935-8)

What’s in it: Argues that the experimental evidence so far is consistent with the idea form-meaning mappings are bidirectional (contra Ramscar et al., 2010), but that form-meaning mappings are not all there is to morphology.  In particular, argues (following Kapatsinski 2013, 2017) that we need copying of chunks from activated forms and paradigmatic mappings, and exemplifies how these processes work together in Russian bez- adjective formation (relying in part on data examined in this paper: 2010). Shows that some but not all paradigmatic mappings may be dispensed with if negative form-meaning associations are allowed.

Smolek, A., & Kapatsinski, V. (2023). Syntagmatic paradigms: Learning correspondence from contiguityMorphology, 33, 287-334. https://link.springer.com/article/10.1007/s11525-023-09411-w

What’s in it: Presents experimental data showing that temporal adjacency helps learn both the ways in which paradigmatically related words (like singulars and plurals) differ, and what they share. Rescorla-Wagner modeling suggests that this effect is due to the greater salience of form outcomes that represent relations between the product form from the source form. Since we know that morphologically related words do tend to co-occur in close temporal proximity (Xu & Croft, 1998), this may be what allows speakers to learn paradigmatic relations between words.

See also:  Carvalho & Goldstone (2014), showing a similar effect (though apparently limited to differences) in visual category learning; Copot & Bonami (2023) showing learning of paradigms;

Smolek, A., & Kapatsinski, V. (2018). What happens to large changes? Saltation produces well-liked outputs that are hard to generateLaboratory Phonology9(1), 10.

What’s in it: Presents experimental data supporting the claim that errors like bup –> buptSi and, more generally, the preference against stem changes result from motor perseveration in deriving novel forms of known words and the difficulty with associating articulatorily dissimilar units (Rescorla, 1973; Kapatsinski, 2011 ICPhS). The tendency to level stem changes is stronger for articulatorily ‘bigger’ changes (like p–>tS vs. k–>tS). Learners trained on p>tSa overgeneralize to t>tSa and k>tSa but not vice versa. So, those exposed to p>tSa, t>ta, k>ka learn to palatalize everything or to palatalize nothing. In contrast, those exposed to  t>tSa, p>pa, k>ka or k>tSa, p>pa, t>ta learn to palatalize the right thing. Note that there is no reason to palatalize any consonant before [a], so this is not a bias in favor of the natural rule (change in context) but in favor of the smaller change. This asymmetry is present in both production (here is a new singular, make a plural) and judgment (is this the right plural for this singular?) but is significantly stronger in production than in judgment. We think judgment sometimes involves simulating production (“would I say that?”), so the effect is still there in judgment but is weaker than in real production, attesting to its articulatory nature. For the opposing position, see Steriade (2001) and White (2014). For related data, see Do & Albright (Submitted), Kerkhoff (2007), Krajewski, Theakston, Lieven & Tomasello (2011), and White (2014).

See also:

Stave, M., A. Smolek, & V. Kapatsinski. (2013). Inductive bias against stem changes as perseveration: Experimental evidence for an articulatory approach to output-output faithfulnessProceedings of the 35th Annual Conference of the Cognitive Science Society, 3454-59. Austin, TX: Cognitive Science Society. What’s in it: Learners exposed to a language in which /p/ alternates with /tS/ before -a but /t/ and /k/ do not (labial palatalization in an unnatural context) learn either that all stops become /tS/ or that nothing does. In contrast, learners exposed to a language in which either /t/ or /k/ becomes /tS/ but /p/ does not learn to palatalize only /t/ or /k/ without overgeneralizing to /p/. We argue that this is a production bias based on finding that it is attenuated in a judgment task compared to elicited production. See above for more details.

Olejarczuk, P., & V. Kapatsinski. (2018). The metrical parse is guided by gradient phonotacticsPhonology, 35(3), 367-405.

What’s in it: Shows that assigning stress to a novel English word involves speakers probabilistically parsing the word into syllables and then matching the probability of a stress pattern given the parse.

Kapatsinski, V. (2013). Conspiring to mean: Experimental and computational evidence for a usage-based harmonic approach to morphophonologyLanguage, 89(1), 110-48.

Main findings: If you expose human learners to a language with an alternation, like singular buk / plural butSi; singular mak / plural matSi etc. learn the alternation better if they also encounter a lot of pairs like singular blitS / plural blitSi. However, they learn the alternation worse if they encounter a lot of pairs like singular blip / plural blipi or singular blit / plural blitiLearners also often produce blutSi from blut and bluptSi from blup.

Theory: Learners are acquiring sublexical constructions like ‘plural = …tSi‘. These constructions are gradually becoming more specific: start out with ”plural = …i“, which is satisfied with producing bluk from bluki; then, acquire ‘plural = …tSi‘, which is satisfied with producing bluktSi from bluk; then acquire ‘plural = …VtSi‘, which requires bluk  to be changed into blutSi. The constructions are competing with a tendency to repeat the articulatory gestures comprising the known, singular form (and other forms that come to mind during production). So, the [k] of bluk wants to surface in the plural. A strong, and specific construction is required to override [k]’s desire to be expressed. This is how you get bluptSi from blup, which satisfies both ‘plural = …tSi‘ and the perseveratory pressure to keep the [p] of blup. Reusing as much of the known form as you can is argued to be functional (improving performance of children as well as connectionist models).

See also:

Kapatsinski, V. (2017) Learning a subtractive morphological system: Statistics and representations. In Maria LaMendola and Jennifer Scott (Eds.), Proceedings of the Boston University Conference on Language Development, 41(1), 357-372. Cascadilla Press. — What’s in it: subtractive morphology being reanalyzed by learners as truncation through product-oriented generalization

Kapatsinski, V. (2017) Copying, the source of creativity. In A. Makarova, S. M. Dickey & D. Divjak (Eds.), Each venture a new beginning: Studies in honor of Laura A. Janda, 57-70. Bloomington, IN: Slavica. — What’s in it: spells out the copying part of the model in a connectionist framework

Kapatsinski, V. (2013). Morphological schema induction by means of conditional inference trees. In B. Cartoni, D. Bernhard, & D. Tribout, eds. TACMO Workshop. Theoretical and Computational Morphology: New Trends and Synergies, 11-14. Geneva. What’s in it? Discusses advantages and limitations of the conditional inference tree implementation of sublexical construction learning used in the main paper above.

Kapatsinski, V. (2012). What statistics do learners track? Rules, constraints or schemas in (artificial) grammar learning. In Gries, S. Th., & D. Divjak, eds. Frequency effects in language: Learning and processing, 53-82. Mouton de Gruyter. What’s in it? The effect of type of training (presenting singulars and plurals that share the same stem next to each other or not) on the acquired grammar. (The same main results above are obtained either way.) Also argues that in order to support learning the output of speech perception should not be simply the identity of the most likely phoneme or word but a probability distribution over possible phonemes or words.

Kapatsinski, V. (2010). Velar palatalization in Russian and artificial grammar: Constraints on models of morphophonologyJournal of Laboratory Phonology, 1(2), 361-393. What’s in it? As also shown in the 2013 Language paper, if you expose human learners to a language with an alternation, like singular buk / plural butSi; singular mak / plural matSi etc. learn the alternation worse if they encounter a lot of pairs like singular blip / plural blipi or singular blit / plural bliti. The diachronic prediction is the following: If a suffix (-i here) that triggers an alternation is usually used with stems that cannot undergo an alternation, it is predicted to be a bad trigger for the alternation. In other words, the alternation should be(come) unproductive.  This is shown to have happened in Russian, where the alternation k–>tS is productive before -ok and -ek, diminutive suffixes that tend to attach to [k]-final stems but unproductive before -i, a stem extension that usually attaches to non-velars, and before -ik, a diminutive that rarely attaches to velars. The Russian data also provide evidence against the traditional notion that the suffix is chosen before the decision to alternate is made. For a short version, see Kapatsinski, V. (2010). Rethinking rule reliability: Why an exceptionless rule can failChicago Linguistic Society, 44(2), 277-291.

Some related work from other theoretical perspectives: Albright & Hayes (2003), Becker & Fainleib (2009), Gouskova & Becker (2013), Hayes & Wilson (2008), Labov (1969)

Kapatsinski, V. (2005). Characteristics of a rule-based default are dissociable: Evidence against the Dual Mechanism Model. In S. Franks, F. Y. Gladney, and M. Tasseva-Kurktchieva, eds. Formal Approaches to Slavic Linguistics 13: The South Carolina Meeting, 136-146. Ann Arbor, MI: Michigan Slavic Publications.

What’s in it? Elicited production evidence from Russian showing that there are morphological systems in which there is no default expression for a certain meaning. Furthermore, the pattern least sensitive to similarity to existing words may not be the one that is most productive. Thus, the two defining properties of a default (regular) pattern under Pinker & Prince’s Dual Mechanism Model  are dissociable. This provides evidence against Pinker & Prince’s Dual Mechanism Model, even in its weak form where regulars are sometimes stored, but less often than irregulars. There is also highly related work making the same point for Polish: Dabrowska (2001, 2004)

Views and reviews on productivity

Kapatsinski, V. (2023). Understanding the roles of type and token frequency in usage-based linguistics. In M. Diaz-Campos & S. Balasch (Eds.), The handbook of usage-based linguistics, 91-106. Wiley. Unformatted version — the roles of type and token frequency in productivity, and why one might have different expectations about them based on different views of learning and representation

Kapatsinski, V. (2018). On the intolerance of the Tolerance PrincipleLinguistic Approaches to Bilingualism, 8(6), 738-742. — In-principle and empirical arguments against Charles Yang’s Tolerance Principle

Kapatsinski, V. (2018). Words versus rules (storage versus online production/processing) in morphology. In M. Aronoff (Ed.), Oxford Research Encyclopedia of Linguistics. Oxford University Press. — an opinionated review of the debate on storage vs. computation in morphological productivity, concluding that it’s a false dichotomy (a la Langacker)

Kapatsinski, V. (2018). Learning morphological constructions. In G. Booij (Ed.), The construction of words: Advances in construction morphology, 547-581. Springer. — a review of work on how productive morphological form-meaning mappings are learned

Kapatsinski, V. (2017) Copying, the source of creativity. In A. Makarova, S. M. Dickey & D. Divjak (Eds.), Each venture a new beginning: Studies in honor of Laura A. Janda, 57-70. Bloomington, IN: Slavica. — copying of chunks from activated forms into the production plan as the source of paradigm uniformity effects and stem preservation; novel form production as blending of activated chunks

 

Inductive bias

Work in this vein examines inductive biases involved in learning phonology.

Kapatsinski, V., P. Olejarczuk, & M. A. Redford. (2017). Perceptual learning of intonation contour categories in adults and 9 to 11-year-old children: Adults are more narrow-minded. Cognitive Science, 41(2), 383-415.

What’s in it: Extends work on perceptual category learning to categories of temporally extended patterns. Previous work on linguistic patterns look at short patterns like vowels, consonants, tones and pitch accents. Previous work on non-linguistic category learning looks at simple visual patterns that can be perceived simultaneously. Examining temporally extended patterns is important to understand the roles of short-term/working memory, local/global processing biases and temporal integration in categorization. We show that intonation contour category representations in both adults and 9 to 11-year-olds are relatively abstract: novel and familiar exemplars are accepted into the category equally often if distance to the category center is controlled. This is inconsistent with exemplar models of categorization but is quite in line with decision-bound  (Ashby & Gott 1988) and window-based (Keating 1990) models. We also show that adults are less accepting, rejecting high distortions of the category prototypes that children accept. These high distortions are categorized as not belonging to any of the observed contour categories, since they are generally not highly confusable with examples of the other categories.  We discuss possible reasons for children’s greater tolerance to deviation from prior experience. We are currently extending this work to adults with different L1 experiences and children with autism.

See also:

For starting small and starting big, see Newport (1990), Elman (1993), Arnon & Ramscar (2012) and Tomasello (2003)

On perceptual learning, good places to start are Gibson & Gibson (1955), Goldstone (1998), Maye, Werker & Gerken (2002), Bertelsen, Vroomen & DeGelder (2003), Norris, McQueen & Cutler (2003), and Idemaru & Holt (2011)

On category learning, and the prototype distortion paradigm, see work citing Posner & Keele (1968) See J. Smith & Minda (2000) on limitations of the literature.

On developmental differences in category learning, I found the following very useful: Aslin & L. Smith (1988), L. Smith (1989), Thompson (1994), Ward et al. (1990), & Wills et al. (2013)

On learning categories of temporally extended patterns, see Berger & Hatwell (1996) and Schwarzer (1997)

For perceptual learning in language, the most extended patterns examined are probably pitch accent and stress patterns, which require comparing two adjacent syllables: Shport (2011) and Reinisch & Weber (2012)

Johnston, L. H., & V. Kapatsinski. (2011). In the beginning there were the weird: A phonotactic novelty preference in adult word learning. Proceedings of the 17th International Congress of Phonetic Sciences, 1022-1025.

What’s in it: A cross-situational word learning study (using the paradigm developed by Yu & L. Smith 2007), showing that learners are more accurate if the words being learned are phonotactically illegal (see also Chang 2013 and Zellou et al. 2024). We have replicated this with words being mapped onto pictures of tools rather than alien creatures, so it’s unlikely that this effect is just a bias to select the most science-finctiony words. It could be that the illegal clusters are cues to this words having been previously encountered during the experiment though. As part of this study we also found that even our adult learners would accept one-feature deviations from the trained words (which did not make it into the paper due to space constraints), as shown previously for infants (e.g. Stager & Werker 1997, Swingley 2005). Since we find this for adults, the acceptance of minor deviations from unfamiliar words does not appear to be about infants’ speech perception abilities being immature. Rather, it is likely that representations of word forms (and construction forms more generally) become more specific with experience (Kapatsinski 2013; see also White et al. 2013).

Kapatsinski, V. (2011). Modularity in the channel: The link between separability of features and learnability of dependencies between them. Proceedings of the 17th International Congress of Phonetic Sciences, 1022-1025.

What’s in it: Moreton (2008) introduced a distinction between channel bias and inductive bias. Channel bias involves imperfect transmission from the speaker to the listener because of noise in the ‘channel’ (speech production mechanism, air, hearing, speech perception mechanism). Inductive bias of the language learner is localized in the prior over grammars used by the (hypothetically Bayesian) language acquisition device. Moreton argued that most biases affecting acquisition of language identified so far could be thought of as channel biases. (I concur.) In this paper, Moreton argued that one type of bias, which he terms ‘modularity bias’, is an inductive bias, caused by the higher prior probability of simpler grammars. The modularity bias refers to the finding that dependencies involving a consonant feature and a vowel feature are harder to learn than those involving two vowel features or two consonant features. This paper presents evidence that the bias also manifests itself in perception (using the Garner interference paradigm of Garner & Felfoldy 1970). Following Warker, Dell, Whalen & Gereg (2008), I argue that the modularity bias is a channel bias too. I suggest that the reason for it is because consonants and vowels form distinct clusters of representations in the brain (recently demonstrated by Mesgarani, Cheung, Johnson & Chang 2014). Learning associations between arbitrary vowel and consonant features is then expected to involve modifying more connections than learning associations between vowel features or associations between consonant features. More generally, this discussion can be seen as an instance of the debate on the division of labor between Bayesian computational vs. mechanistic (connectionist and agent-based) models in cognitive science. (The latter appear best suited for simulating the sources of channel bias.) See also McClelland, Botvinick, Noelle, Plaut, Rogers, Seidenberg & L. Smith (2010). vs. Griffiths, Chater, Kemp, Perfors & Tennenbaum (2010).

Smolek, A., & Kapatsinski, V. (2018). What happens to large changes? Saltation produces well-liked outputs that are hard to generateLaboratory Phonology9(1), 10.

What’s in it: Presents experimental data supporting the claim that errors like bup –> buptSi and, more generally, the preference against stem changes result from motor perseveration in deriving novel forms of known words and the difficulty with associating articulatorily dissimilar units (Rescorla, 1973; Kapatsinski, 2011 ICPhS). The tendency to level stem changes is stronger for articulatorily ‘bigger’ changes (like p–>tS vs. k–>tS). Learners trained on p>tSa overgeneralize to t>tSa and k>tSa but not vice versa. So, those exposed to p>tSa, t>ta, k>ka learn to palatalize everything or to palatalize nothing. In contrast, those exposed to  t>tSa, p>pa, k>ka or k>tSa, p>pa, t>ta learn to palatalize the right thing. Note that there is no reason to palatalize any consonant before [a], so this is not a bias in favor of the natural rule (change in context) but in favor of the smaller change. This asymmetry is present in both production (here is a new singular, make a plural) and judgment (is this the right plural for this singular?) but is significantly stronger in production than in judgment. We think judgment sometimes involves simulating production (“would I say that?”), so the effect is still there in judgment but is weaker than in real production, attesting to its articulatory nature. For the opposing position, see Steriade (2001) and White (2014). For related data, see Do & Albright (Submitted), Kerkhoff (2007), Krajewski, Theakston, Lieven & Tomasello (2011), and White (2014).

See also:

Stave, M., A. Smolek, & V. Kapatsinski. (2013). Inductive bias against stem changes as perseveration: Experimental evidence for an articulatory approach to output-output faithfulnessProceedings of the 35th Annual Conference of the Cognitive Science Society, 3454-59. Austin, TX: Cognitive Science Society. What’s in it: Learners exposed to a language in which /p/ alternates with /tS/ before -a but /t/ and /k/ do not (labial palatalization in an unnatural context) learn either that all stops become /tS/ or that nothing does. In contrast, learners exposed to a language in which either /t/ or /k/ becomes /tS/ but /p/ does not learn to palatalize only /t/ or /k/ without overgeneralizing to /p/. We argue that this is a production bias based on finding that it is attenuated in a judgment task compared to elicited production. See above for more details.

 

Statistical Methods

Work in this vein aims to improve statistical inference methods in the field by identifying common issues that arise in statistical analysis of linguistic data, and trying to evaluate possible solutions, or showcasing underutilized (at the time) techniques.

Houghton, Z. N., & Kapatsinski, V. (2024). Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression. Behavior Research Methods, 56, 5557-5587. https://link.springer.com/article/10.3758/s13428-023-02287-y

What’s in it: Shows that it is surprisingly hard to tell whether participants (or items) come from one or two populations using mixed-effects logistic regression because a single multivariate normal but highly variable population will often give rise to a non-normal and bi-/ multimodal distribution of BLUPs in logistic regression. This happens because the discrete nature of data, and limited sample size mean that only some points in log odds space are observable, and because the logistic transformation makes observable values get farther apart as one approaches the limits of observable space

Barth, D., & V. Kapatsinski. (2017). A multimodel inference approach to categorical variant choice: construction, priming and frequency effects on the choice between full and contracted forms of am, are and isCorpus Linguistics & Linguistic Theory. Please e-mail for offprints.

What’s in it: Applies multimodel inference (model averaging) to quantitative analysis of corpus data. Discusses the difference between E-Language and I-Language (Chomsky 1986) goals of corpus analysis. We argue that the realistic goal for corpus analysis is to discover a grammar (for us, statistical model) that is good at predicting characteristics of future language samples from the speech community that is represented by the corpus (E-Language goal). The I-Language goal of discovering the grammar that generated the corpus is likely not achievable: there is no single best grammar because there are many idiolects within a community. Furthermore, if the I-Language grammar is as complex as neurology would lead us to believe, its parameters are likely not discoverable from the corpus (See Hockett 1965 and Householder 1966 for similar ideas). Attempting to fit all of the many parameters of a complex grammar reduces the grammar’s predictiveness. So the true generating model, even if it existed, would likely to be bad at predicting future samples from the same community. Fortunately, we show that we do not have to select the single best grammar to achieve the E-Language goal. Instead we infer a whole ensemble of plausible grammars and make predictions by weighting each grammar by its predictiveness. We note that, in contrast to selecting the single best model, this approach properly takes into account model selection uncertainty. Because we do not have to select the best, or true model, we also don’t need as much data as in traditional model selection approaches, which makes this a promising approach for underdocumented languages. We also argue that inferring an ensemble of grammars that underlie the corpus is a better approach for approximating the I-Language goal: because language is redundant, the same behavioral output (and hence corpus data) can be achieved by many different internal grammars. The method is applied to data on contraction in English auxiliaries.

See also

Burnham & Anderson (2002), who developed multimodel inference in the information-theoretic framework employed here

Kuperman & Bresnan (2012), which introduced multimodel inference to the field

The MuMIn package in R (Barton 2013) to do multimodel inference.

Some related ideas are discussed in Divjak & Arppe (2013)Baayen, Janda, Nesset, Dickey & Makarova (2013) and Tagliamonte & Baayen (2012).

An application to assessing predictor importance is here: Kapatsinski, V. (2013). Towards a de-ranged study of variation: Estimating predictor importance with multimodel inference. In preparation.

Barth, D., & V. Kapatsinski. (2012). Evaluating logistic mixed-effects models of corpus data. Under review.

What’s in it: Discusses some issues in evaluating how well a mixed-effects model fits a dataset. Argues for cross-validating the model on a new dataset. Uses Monte Carlo simulations to show that the standard approach at the time (C Score on fit data, as in Baayen 2008) fails to discover the difference between a model with a true predictor and a model in which the values of that predictor have been randomly scrambled.

See also:

For mixed-effects models in linguistics, see e.g. Baayen 2008, Bresnan, Cueni, Nikitina & Baayen (2007), Dixon (2008), and other papers in the same special issue of JML, Johnson (2009), and Barr, Levy, Scheepers & Tily (2013)

For comparisons between regression-based (as in this paper) and tree-based approaches, see Baayen, Janda, Nesset, Dickey & Makarova (2013), Tagliamonte & Baayen (2012), Kapatsinski (2013a) and, from an explanatory adequacy perspective, Kapatsinski (2013b)

Kapatsinski, V. (2014). What is grammar like? A usage-based constructionist perspective. Linguistic Issues in Language Technology.

What’s in it: Some general characteristics of what I believe a plausible statistical/computational model of grammar should have. (Non-parametric, sensitive to complex non-crossover interactions, not committed to a single model for every speaker, etc.) See above for more information. Lots of references inside.

Kapatsinski, V. (2014). Sound change and hierarchical inference. What is being inferred? Effects of words, phones and frequency. Under review.

What’s in it: A view of sound change as involving articulatory reduction (happens every time a word is used) and hierarchical statistical inference (reduction is attributed to both words and sounds). A preliminary test of the idea with American English flapping is reported (follow-up trying to make learners to believe novel words to be slang and thus prone to reduction is ongoing).

Kapatsinski, V. (2010). What is it I am writing? Lexical frequency effects in spelling Russian prefixes: Uncertainty and competition in an apparently regular system. Corpus Linguistics and Linguistic Theory, 6(2), 157-215.

What’s in it: For the purposes of this section, an example of Monte Carlo simulations to assess the significance of a negative correlation between word frequency and error rate on a word (errors/word frequency). See below for the linguistic results.

Kapatsinski, V. (2008). Principal components of sound systems: An exercise in multivariate statistical typology. Indiana University Linguistics Club Working Papers Online, 08-08.

What’s in it: An application of principal components analysis to uncover what information about language relatedness is available in phoneme inventories.

Kapatsinski, V. (2006). Sound similarity relations in the mental lexicon: Modeling the lexicon as a complex network. Research on Spoken Language Processing Progress Report No.27, 133-152. Indiana University Speech Research Lab.

What’s in it: Assessing the large-scale structure of the phonological wordform lexicon under different definitions of what it means for two words to be neighbors of each other (similar enough to compete during word recognition). Shows that normalizing similarity by word length improves predictiveness of neighborhood density for word recognition times in a megastudy of English lexical decision and naming (the English Lexicon Project, Balota et al. 2007). More broadly speaking, argues that words become more similar if they share a lot, rather than only becoming less similar if they differ by much (existing measures of similarity take into account only how much two words differ, e.g. Luce & Pisoni 1998).

See also:

For the English Lexicon Project, Balota et al. 2007

For other megastudies on the mental lexicon, see the work being done at Ghent by Brysbaert and colleagues: http://crr.ugent.be/programs-data/lexicon-projects and http://crr.ugent.be/programs-data/subtitle-frequencies and http://crr.ugent.be/archives/1602

For measures of phonological similarity, see the section below

For other examples of complex network analysis applied to the phonological wordform lexicon, see Altieri, Gruenenfelder & Pisoni (2010), Arbesman, Strogatz & Vitevitch (2010), Chan & Vietvitch (2010), Gruenenfelder & Pisoni (2006), Vitevitch (2008), Vitevitch, Chan & Goldstein (2014), Vitevitch, Chan & Roodenrys (2012) My general concern regarding this literature (and the reason I have not pursued complex network analyses beyond the 2006 paper) is that ‘phonological neighbor’ is a very ill-defined notion. We are unlikely to ever find a task-independent measure of between-word phonological similarity that would work across perception, production, recall etc. Task-specific measures are in their infancy. Furthermore, there would likely be no cutoff value of similarity, such that words that are less similar than that value do not interact during word recognition, production or recall. It therefore appears problematic to model lexicon structure by choosing a cutoff on some arbitrary measure of similarity and assuming that words that are more similar than that value have links between them, and words that are less similar do not.

For applications of complex networks to the semantic lexicon, see Steyvers & Tennenbaum (2005), and the large body of work resulting from that paper I find Gruenenfelder & Pisoni (2009) and Jones & Gruenenfelder (2011) particularly interesting in trying to address the question of what it means for two words to be semantically similar.

For a general review of the landscape of studies on network structures in language, see Choudhury & Mukherjee (2009)

Constituency, chunking, the units of language

Generally speaking, this line of work argues for there being local (node-like) representations for many frequent yet compositional linguistic units. As noted by Langacker (1987), this does not mean that there are not also smaller units within these. There are many routes from form to meaning or from meaning to form (Baayen, Dijkstra & Schreuder 1997). You may take a route going through one set of units on one occasion, and a route going through another set of units on another. Having a node allocated to a unit allows that unit to acquire associations that its parts do not have (Bybee 2006) and to compete with other units for selection or at least association strength (Kapatsinski 2007, Oppenheim, Dell & Schwarz 2010).

Kapatsinski, V., & J. Radicke. (2009). Frequency and the emergence of prefabs: Evidence from monitoring. In R. Corrigan, E. Moravcsik, H. Ouali, & K. Wheatley, eds.  Formulaic Language. Vol. II: Acquisition, loss, psychological reality, functional explanations, 499-520. Amsterdam: John Benjamins. (Typological Studies in Language 83).

What’s in it:

We argue that, as long as the whole and its parts are both lexical (linked to meaning), the whole competes with its parts for recognition. (See also Bybee & Brewer 1980, Hay 2001, Healy 1976, 1994, Sosa & MacFarlane 2002. For the opposing view, see McClelland & Rumelhart 1981; Rumelhart & McClelland 1982).

We use a monitoring task, in which participants monitor spoken sentences for occurrence of /Λp/ (up), whether it is a word, a morpheme, or just a sequence of sounds inside a word. The subjects press a button as soon as they hear /Λp/. In this, we follow Sosa & MacFarlane 2002, who monitored for of but only when it was a separate word.

We find that /Λp/ (up), is easier to detect (in terms of reaction time) when it is a morpheme than when it is not, and when it is a syllabic constituent (rime) than when it is not. This supports the notion that unithood makes the unit easier to detect, as well as providing evidence for rimes as units in English. (See also Kapatsinski 2009, Lee & Goldrick 2008, Olejarczuk & Kapatsinski 2013 and Treiman & Danis 1988). The influence of syllable structure on monitoring was not observed for English by Cutler, Mehler, Norris & Segui (1986) and Bradley, Sanchez-Casas & Garcia-Albea (1993), which led to claims that syllables (or rimes) are not perceptual units in English (e.g. Cutler, McQueen, Norris & Somejuan 2001). We argue that these previous results are due to the fact that stimuli deemed to have a CV.CVC structure had a liquid intervocalic consonant in these previous studies, along with a lax first-syllable vowel and first-syllable stress. With such words, the intervocalic consonant is usually parsed into the first syllable (Derwing 1992), so the structure is actually CVC.VC. Detecting a CVC in CVC.VC is no harder than in CVC.CVC. In our stimuli, the intervocalic consonant /p/ is a stop, which is parsed into the second syllable (as in upon). This makes up hard to detect in such words. See also Morgan & Wheeldon (2003). Note that we do not claim that the rime (or the syllable) is the unit of perception. There is no single unit of perception in a multiroute model. We just claim that it is a unit you sometimes use in perception.

More interestingly, there was a robust effect of word frequency: /Λp/ was harder to detect in high-frequency words. The effect was monotonic: it was observable throughout the frequency range. As word frequency increased, /Λp/ became harder and harder to detect. We take this as evidence that words were competing for recognition with up, with frequent words being stronger competitors than rare words (Bybee & Brewer 1980, Hay 2001, Healy 1976, 1994). I suspect that this is because /Λp/ is a meaningful unit in the lexicon, and if we were to pick a meaningless phoneme we would not find this, but this is untested, as far as I know.

We did not find this inhibitory effect of frequency for verb+up phrases, except, perhaps, for the very top of the frequency range. In fact, we found a facilitatory effect of phrase frequency throughout most of the range: up was easier to detect in moderately frequent bigrams like walk up than in rare ones like eke up. This is expected if up is predicted on the basis of the preceding context. The lack of an inhibitory frequency effect throughout most of the phrase frequency range suggests that phrases do not compete with their parts for recognition, unless they are super frequent, and so may not be stored as units except in case of very high frequency. The difference in unithood between words and phrases is also supported by the observation (Harmon & Kapatsinski 2014) that, following a disfluency, production never restarts from a word-internal position (I had a similar, uh, similar health plan. vs. *I had a similar, uh, -milar health plan. or even *It is darkest, uh, -est before sunrise.) Production does sometimes restart from phrase internal position, even for very frequent phrases (This is a kind of, uh, of a health insurance plan. It came up, uh, up during the meeting).

The conclusion that phrases are not stored as units unless very frequent is debated by Cappelle, Shtyrov & Pulvermüller (2010) based on finding that the MMN ERP response to up is reduced in frequent verb+up phrases (that are not as frequent as our super-frequent phrases). Given the differences in methodology, it is not clear how to interpret this discrepancy. For example, we have a large continuous range of frequencies but do not include ungrammatical word combinations. Cappelle et al. test only two verbs (heat up and cool down) and contrast them with ungrammatical combinations (heat down and cool up). One interpretation of the difference is that heat up and cool down activate richer semantic networks (as Cappelle et al. suggest) than does ungrammatical word salad; then the same result would be obtained by contrasting grammatical and ungrammatical uses of, say, eke up. In this case, the results do not speak directly to the question of whether cool down and heat up are units. To address this question, frequent and infrequent interpretable particle verbs need to be contrasted, as in Tremblay et al. (2014)., where Generalized Additive Mixed Models are used to look for non-linearities in the effect of phrase frequency on EEG signals. The results appear to be consistent with our position. An alternative interpretation is that there is an early effect of storage (detected by Cappelle et al.) that is then overridden by the effect of predictability (which they claim to be postlexical). Cappelle et al. suggest that the MMN at the phrase level seems to not be sensitive to between-word probabilities. Monitoring latency certainly is. However, for me, it does not make much sense for the effect of predictability to be a postlexical effect if it is to be helpful for everyday word recognition in context.

We did find that up was harder to detect in the most frequent verb-particle combinations than in medium-frequency ones. We interpreted this result as indicating that the highest-frequency phrases like come up are stored as units and compete with their parts for recognition. We should caution, though, that there are very few such units (Zipf’s Law strikes again). Thus, other causes of the differences having to do with idiosyncracies of semantics of phonology cannot be ruled out with great certainty without replications. Our paper itself is a follow-up on Sosa & MacFarlane 2002 where of was found to be harder to detect in frequent phrases like kind of but the phonological confounds are even greater there. Some encouraging results were, however, recently reported by Tremblay et al. (2014).

Harmon, Z., & V. Kapatsinski. (2014). Determinants of lengths of repetition disfluencies: Probabilistic syntactic constituency in speech production. Chicago Linguistic Society 50.

What’s in it:  Examines repetition disfluencies as a window on between-word cohesion and constituency. We argue that in producing sequences like I had a similar- + a similar health plan speakers tend not to restart production from inside a cohesive unit. For example, speakers never restart from inside a word, as in *I had a similar + -milar health plan. Two influences on cohesion above the word level are syntactic constituency (Levelt 1983) and co-occurrence. Speakers tend to restart from the nearest major constituent boundary, unless it’s high in backwards transitional probability. Note, however, that even the most cohesive word sequences are not as cohesive as the least cohesive words: you can restart from the beginning of lot or of in the very frequent a lot of but not from the beginning of -ness in the infrequent word shallowness. This argues that prefabs are not quite ‘big words’ (as also argued in Kapatsinski & Radicke 2009).

See also:

Kapatsinski, V. (2005). Measuring the relationship of structure to use: Determinants of the extent of recycle in repetition repairBerkeley Linguistics Society 30, 481-492.

Kapatsinski, V. (2010). Frequency of use leads to automaticity of production: Evidence from repair in conversationLanguage and Speech, 53(1), 71-105.

What’s in it: Examines replacement repairs, as in I used to listen to the newsp- + the radio in the morning. Documents that the likelihood of interrupting a to-be-replaced word (newspaper in this case) is related to the frequency of the word. Rare words are more likely to be interrupted than frequent words, even controlling for the fact that frequent words tend to be shorter. This provides evidence for the hypothesis that words are units of speech execution: the more frequent a word, the more cohesive it is, and the more automatized and ‘ballistic’ its production, making it harder to interrupt. Note that, unlike the categorical restriction on restarting from the middle of a word, the tendency not to stop the production of a word before its complete is gradient. I would argue that this is because words consist of smaller units that can be monitored and suppressed (accounting for the results of Tilsen & Goldstein 2012) but that nonetheless the word forms a whole (contra Tilsen & Goldstein 2012), so production always starts from some word boundary.

For experimental work documenting frequency effects on how difficult it is to stop production of a word, see Logan (1982)

For other work on repair, disfluencies and cohesion, see Goldman-Eisler (1957), Clark & Wasow (1998), Levelt 1983, Maclay & Osgood (1959), Plug & Carter (2013), Schnadt (2009), Schneider (2014) and Tannenbaum et al. (1965)

Kapatsinski, V. (2010). Frequency of use leads to automaticity of production: Evidence from repair in conversation. Language and Speech, 53(1), 71-105.

What’s in it: Word frequency is argued to result in automatized production of the word: even when the speaker intends to stop production, in order to replace the word with another, stopping is delayed when the word is frequent. Words are therefore argued to be units of execution in English.

Kapatsinski, V. (2021). Hierarchical inference in sound change: Words, sounds and frequency of useFrontiers in Psychology, 12. (Part of “Rational Approaches in Language Science“, ed. by M. W. Crocker, G. Jaeger, G. Kuperberg, E. Teich, & R. Turnbull, a joint research topic by Frontiers in Psychology and Frontiers in Communication.) https://doi.org/10.3389/fpsyg.2021.652664

What’s in it: For the purposes of this section, argues that both words and ‘sounds’ (sublexical structures, be they phonemes or gestures) are units of sound change. ‘Blame’ for a pronunciation of a sound can be split between the word, the sound and contextual factors in a principled way using hierarchical inference.

Kapatsinski, V. (2009). Testing theories of linguistic constituency with configural learning: The case of the English syllable. Language, 85(2),  248-277.

What’s in it: Linguistic constituents are usually represented as nodes in trees. However, dependency grammar represents them as connections between the parts. This paper argues that, iff constituents are nodes, then they should be able to acquire associations that their parts do not have. This is similar to an argument Bybee (2006) uses to argue that originally compositional words or phrases must be stored in order to become non-compositional, through association with ‘special’ phonology, semantics or morphosyntax. This paper reports that at least some rimes (phonological constituents) in English (like the /ag/ in /gag/) can be associated with either prefixes or suffixes (Cag-num or num-Cag) without the parts of the rime becoming associated with the same prefix or suffix. In contrast bodies, like the /ga/ in /gag/, cannot become associated with prefixes or suffixes without sharing those associations with their parts. These results suggest that constituency is more than mere dependency. A constituent has a node that can then be associated with other nodes. A non-constituent is just its parts, so it cannot have associations that its parts do not have.

See also: 

Kapatsinski, V. (2007). Implementing and testing theories of linguistic constituency I: English syllable structure. Research on Spoken Language Processing Progress Report No.28, 241-276. Indiana University Speech Research Lab. What’s in it: Implementations of the competing models of constituency as associative networks.

Kapatsinski, V., & D. B. Pisoni. (2008). The role of phonetic detail in associating phonological units. Poster presented at Laboratory Phonology XI, Wellington, New Zealand. What’s in it: The same vowel varies in its realization  across different instances of the same body more than it varies across different instances of the same rime due to coarticulation. It might be that the test vowels are harder to identify in the body condition than in the rime condition. This poster shows that the rime/body distinction persists if difficulty of recognizing test tokens of the trained vowels is taken into account.

Kapatsinski, V. (2008). Constituents can exhibit partial overlap: Experimental evidence for an exemplar approach to the mental lexicon. In R. L. Edwards, P. J. Midtlyng, C. L. Sprague, and K. G. Stensrud, eds. CLS 41: The Panels, 227-242. Chicago: Chicago Linguistic Society. What’s in it: Argues that both rimes and bodies of Russian verb roots are associate with suffixes based on a wug test.

Olejarczuk, P., & V. Kapatsinski. (2018). The metrical parse is guided by gradient phonotacticsPhonology, 35(3), 367-405.

What;’s in it: Documents the relationship between stress and syllable structure in English: competing tendencies to stress the initial syllable vs. to stress the heavy penultimate syllable, in proportion to the type frequencies of the competing patterns in the lexicon. Current work is examining how these baseline frequencies of stressing initial light vs. penultimate heavy syllables can be affected by additional exposure to the competing patterns.

See also:

For related work on stress placement, see Guion et al. (2003), Domahs, Plag & Carroll (2014) and Ryan (2014)

For some recent work arguing for or against probability matching in extending linguistic patterns, Becker, Ketrez & Nevins (2011)Ernestus & Baayen 2003, Hayes, Siptar, Zuraw & Londe 2009Kapatsinski 2010

For the role of sonority in dealing with novel clusters, see Albright 2007Berent, Steriade, Lennertz & Vaknin 2007, Daland et al. 2011, Hayes 2011, Redford 2008

 

Frequency, familiarity, repetition

These are additional studies on the effects of experience or repetition on behavior that didn’t fit elsewhere.

Vajrabhaya, P., & V. Kapatsinski. (2014). First time’s the charm: First-mention lengthening as an automated act.

What’s in it: Words are longer when mentioned for the first time within a story, even if that story has just been told to the same listener. We argue that this behavior is conventionalized as the way to tell a story.

See also:

Vajrabhaya, P., & E.  Pederson. (In prep) tor evidence that this is NOT what happens with gestures: they continue to reduce with repetition.

Vajrabhaya, P., & V. Kapatsinski. (2011). There is more to the story: First-mention lengthening in Thai interactive discourse. Proceedings of the 17th International Congress of Phonetic Sciences, 2050-2053. What’s in it: Same finding in Thai

Kapatsinski, V., & R. Janda. (2011). It’s around here: Residential history and the meaning of ‘Midwest’. Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2983-2988.

What’s in it: Where is the Midwest? It depends where you are from. For Midwesterners, it’s around where they’ve lived: the Midwest appears to be anchored in the familiar exemplars of Midwestern locations. For outsiders, it’s a  mix: some think it’s the middle of the west, some go on an official definition.  For more data on this, see http://fivethirtyeight.com/datalab/what-states-are-in-the-midwest/, and http://fivethirtyeight.com/datalab/more-data-analysts-went-looking-for-the-south-and-midwest-and-heres-what-they-found/

Kapatsinski, V. (2006). Having something common in common is not the same as sharing something special: Evidence from sound similarity judgments. Paper presented at the LSA Annual Meeting, Albuquerque, NM.

What’s in it: Argues that perceived similarity is increased when the shared part is something rare. Discusses implications for models of lexical representation.

 

Similarity

Work in this vein asks what makes two words similar enough for one to have an appreciable effect on the processing of the other.

Teruya, H., & Kapatsinski, V. (2019). Deciding to look: Revisiting the linking hypothesis for spoken word recognition in the visual world. Language, Cognition & Neuroscience, 34(7), 860-881.

Teruya, H., & V. Kapatsinski. (2012). Sharing the beginning vs. the end: Spoken word recognition in the visual world paradigm in Japanese. Paper presented at the Linguistic Society of America Annual Meeting, Portland, OR.

Kapatsinski, V. (2006). Having something common in common is not the same as sharing something special: Evidence from sound similarity judgments. Paper presented at the LSA Annual Meeting, Albuquerque, NM.

Kapatsinski, V. (2006). Sound similarity relations in the mental lexicon: Modeling the lexicon as a complex network. Research on Spoken Language Processing Progress Report No.27, 133-152. Indiana University Speech Research Lab.

Kapatsinski, V. (2004). Phonological similarity relations: Network organization of the lexicon and phonology, VIII Encuentro Internacional de Linguistica en el Noroeste, Hermosillo, Sonora, Mexico (Published in the proceedings of Encuentro).

Locations of Site Visitors