Learning Models & Linguistic Theory

Special collection: Implications of Neural Networks and other Learning Models for Linguistic Theory

Managing Editor: Vsevolod Kapatsinski (University of Oregon)

Co-editor: Gašper Beguš (University of California, Berkeley)

This Linguistics Vanguard special collection is motivated by the recent breakthroughs in the application of neural networks to language data. Linguistics Vanguard publishes short 3000-4000 word articles on cutting-edge topics in linguistics and neighboring areas. Inclusion of multimodal content designed to integrate interactive content (including, but not limited to audio and video, images, maps, software code, raw data, hyperlinks to external databases, and any other media enhancing the traditional written word) is particularly encouraged. Special collections contributors should follow general submission guidelines for the journal (https://www.degruyter.com/journal/key/lingvan/html#overview).

Overview of the special issue topic:

Neural network models of language have been around for several decades, and became the de facto standard in psycholinguistics by the 1990s. There have also been several important attempts to incorporate neural network insights into linguistic theory (e.g., Bates & MacWhinney, 1989; Bybee, 1985; Bybee & McClelland, 2005; Heitmeier et al., 2021; Smolensky & Legendre, 2006). However, until recently, neural network models did not approximate the generative capacity of a human speaker or writer. This changed in the last few years, when large language models (e.g., the GPT family), embodying largely the same principles but trained on vastly larger amounts of data, have made a breakthrough so that the language they generate is now usually indistinguishable from that generated by a human. The accomplishments of these models have led to both calls for further integration between linguistic theory and neural networks (Beguš 2020; Kapatsinski, 2023; Kirov & Cotterell, 2018; Pater, 2019; Piantadosi, 2023) and criticism suggesting that the way they work is fundamentally unlike human language learning and processing (e.g., Bender et al., 2021; Chomsky et al., 2023).

The present special collection for Linguistics Vanguard aims to foster a productive discussion between linguists, cognitive scientists, neural network modelers, neuroscientists, and proponents of other approaches to learning theory (e.g., Bayesian probabilistic inference, instance-based lazy learning, reinforcement learning, active inference; Jamieson et al., 2022; Tenenbaum et al., 2011; Sajid et al., 2021). We call for contributions addressing the central question of linguistic theory — Why are languages the way they are? – by means of a computational modeling approach. Reflections and position papers motivating the best ways to approach this question computationally are also welcome.

The contributions are encouraged to compare different models trained on the same data approximating human experience. Insightful position papers will also be accepted. Contributions should explicitly address the ways in which the training data of the model(s) they discuss resembles and differs from human experience. Contributions can involve either hypothesis testing via minimally different versions of the same well-motivated model (e.g., Kapatsinski, 2023), or comparisons of state-of-the-art models from different intellectual traditions (e.g., Albright & Hayes, 2003; Sajid et al., 2021) on how well they answer the question above.

Research topics within this broad topic include:

1) the learning mechanisms and biases needed for modeling humanlike processing from humanlike experience with

syntax / long-distance dependencies (e.g., Linzen et al., 2016; Beguš et al., 2023)
words (e.g., modeling “wug test performance”, Heitmeier et al., 2021; Kirov & Cotterell, 2018)
miniature artificial languages (e.g., Alamia et al., 2021; Giroux & Rey, 2009; Kapatsinski, 2023; Onnis et al., 2015)
speech perception and production (e.g., Beguš, 2020)

2) biases and mechanisms required for modeling

trajectories of language change through iterated learning and/or use (e.g., Beguš, 2020a; Kapatsinski, 2021), or
linguistic typology (e.g., Brochhagen & Boleda, 2022; Futrell et al., 2022).

This could involve

behavioral comparisons of learning biases in artificial neural networks (or other learning models) and humans (e.g., Ravfogel et al., 2019)
comparison in processing between artificial neural networks and the brain (Schrimpf et al. 2022, Beguš et al. 2023, Li et al. 2023)
behavioral and neural comparisons between models of human language learning varying in neurobiological realism or representational assumptions (e.g., neural networks vs. instance-based models, Johns et al., 2020, Frank & Bod, 2011, or probabilistic models, Griffiths et al., 2010; Wilson & Li, 2021)

Contributors are asked to submit a one-page non-anonymous abstract (plus one page for figures and references) in .pdf format via the following link https://oregon.qualtrics.com/jfe/form/SV_e8LaCg8EqKHzjQG. The abstract should have the title as the top line, author names, affiliations and emails as the second line, and the body of the abstract as a separate paragraph (or three). Please contact managing editor Vsevolod (Volya) Kapatsinski, vkapatsi@uoregon.edu with any questions.

Abstracts will be evaluated for topic relevance for the special collection, and on overall quality. Contributors of selected abstracts will be invited to submit a full paper (3000-4000 words) that will undergo peer review.

Timeline:

abstract due by July 1, 2024
notification of authors by August 1, 2024
full paper due by November 1, 2024
reviews to be completed by January 31, 2025
publication by March 2025

References:

Alamia, A., Gauducheau, V., Paisios, D., VanRullen, R., 2020. Comparing feedforward and recurrent neural network architectures with human behavior in artificial grammar learning. Scientific Reports, 10, 22172.

Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: A computational/experimental study. Cognition, 90(2), 119-161.

Beguš, G. (2020). Generative adversarial phonology: Modeling unsupervised phonetic and phonological learning with neural networks. Frontiers in artificial intelligence, 3, 44.

Beguš, G. (2020a). Deep sound change: Deep and iterative learning, convolutional neural networks, and language change. arXiv preprint arXiv:2011.05463.

Beguš, G., Lu, T., & Wang, Z. (2023). Basic syntax from speech: Spontaneous concatenation in unsupervised deep neural networks. arXiv preprint arXiv:2305.01626.

Beguš, G., Zhou, A. & Zhao, T.C. (2023) Encoding of speech in convolutional layers and the brain stem based on language experience. Scientific Reports 13, 6480. https://doi.org/10.1038/s41598-023-33384-9

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency(pp. 610-623).

Brochhagen, T., & Boleda, G. (2022). When do languages use the same word for different meanings? The Goldilocks principle in colexification. Cognition, 226, 105179.

Chomsky, N., Roberts, I., & Watumull, J. (2023). Noam Chomsky: The False Promise of ChatGPT. The New York Times, 8.

Frank, S. L., & Bod, R. (2011). Insensitivity of the human sentence-processing system to hierarchical structure. Psychological Science, 22(6), 829-834.

Futrell, R., Gibson, E., & Levy, R. P. (2020). Lossy‐context surprisal: An information‐theoretic model of memory effects in sentence processing. Cognitive Science, 44(3), e12814.

Giroux, I., & Rey, A. (2009). Lexical and sublexical units in speech perception. Cognitive Science, 33(2), 260-272.

Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences, 14(8), 357-364.

Heitmeier, M., Chuang, Y. Y., & Baayen, R. H. (2021). Modeling morphology with linear discriminative learning: Considerations and design choices. Frontiers in Psychology, 12, 720713.

Jamieson, R. K., Johns, B. T., Vokey, J. R., & Jones, M. N. (2022). Instance theory as a domain-general framework for cognitive psychology. Nature Reviews Psychology, 1(3), 174-183.

Johns, B. T., Jamieson, R. K., Crump, M. J., Jones, M. N., & Mewhort, D. J. K. (2020). Production without rules: Using an instance memory model to exploit structure in natural language. Journal of Memory and Language, 115, 104165.

Kapatsinski, V. (2021). Hierarchical Inference in Sound Change: Words, Sounds, and Frequency of Use. Frontiers in Psychology, 12, 652664.

Kapatsinski, V. (2023). Defragmenting Learning. Cognitive Science, 47(6), e13301.

Kirov, C., & Cotterell, R. (2018). Recurrent neural networks in linguistic theory: Revisiting Pinker and Prince (1988) and the past tense debate. Transactions of the Association for Computational Linguistics, 6, 651-665.

Li, Yuanning, Gopala K. Anumanchipalli, Abdelrahman Mohamed, Peili Chen, Laurel Carney, Junfeng Lu, Jinsong Wu, Edward Chang. Dissecting neural computations of the human auditory pathway using deep neural networks for speech. Nature Neuroscience, 26:12 (2023): 2213–2225, doi: https://doi.org/10.1038/ s41593-023-01468-4

Linzen, T., Dupoux, E., & Goldberg, Y. (2016). Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4, 521-535.

Onnis, L., Destrebecqz, A., Christiansen, M. H., Chater, N., & Cleeremans, A. (2015). Implicit learning of non-adjacent dependencies: A graded, associative account. In P. Rebuschat (Ed.), Implicit and explicit learning of languages (pp. 213-246). John Benjamins.

Pater, J. (2019). Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language, 95(1), e41-e74.

Piantadosi, S. (2023). Modern language models refute Chomsky’s approach to language. Lingbuzz Preprint, lingbuzz, 7180.

Ravfogel, S., Goldberg, Y., & Linzen, T. (2019). Studying the inductive biases of RNNs with synthetic variations of natural languages. arXiv preprint arXiv:1903.06400.

Sajid, N., Ball, P. J., Parr, T., & Friston, K. J. (2021). Active inference: Demystified and compared. Neural Computation, 33(3), 674-712.

Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45). https://doi.org/10.1073/pnas.2105646118

Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022), 1279-1285.

Wilson, C., & Li, J. S. (2021, August). Were we there already? applying minimal generalization to the sigmorphon-unimorph shared task on cognitively plausible morphological inflection. In Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (pp. 283-291).

Usage-based Linguistics Laboratory

Member Login