Corrections and additions to Kapatsinski, V. (2018). Changing Minds Changing Tools: From learning theory to language acquisition to language change. MIT Press.

Are forms cues to meanings?

p.14. The description of the Arnon & Ramscar (2012) proposal here assumes that learners are predicting meanings from forms, and forms are competing to predict the meaning. This is not actually claimed by Arnon & Ramscar but I think is necessary to claim for their account to explain the finding that second language learners skip determiners. Ramscar et al. (2010) deny the existence of cue competition in predicting meanings from forms. Arnon & Ramscar argue speakers learn to use the preceding determiner and the meaning to cue the upcoming noun. Determiners compete with semantic cues to upcoming wordforms. When the noun first occurs in isolation, the meaning blocks the determiner from becoming associated with the noun. However, it is not clear how it is that not learning that determiners are predictive of nouns makes one skip producing the determiner. If it is still cued by something like the DEF feature, its lack of predictive power should not prevent it from being produced. So it seems necessary for the determiner and the noun to actually compete for semantics. Because there is cue competition but no outcome competition in Rescorla-Wagner, this suggests that form cues do compete for becoming associated with meaning.

Competition between forms to predict meanings also appears necessary to account for the full pattern of results in Ramscar, Dye & Klein (2013). In that paper, the authors also assume that meanings predict forms and not vice versa. However, the code provided makes rather odd predictions not described in the paper.

Using ndl on the cue-outcome structure in the crucial Training Condition 3 provided at the end of https://journals.sagepub.com/doi/suppl/10.1177/0956797612460691/suppl_file/DS_10.1177_0956797612460691.pdf:

results in the following cue-outcome structure, where ‘dax’ was paired with A and B on nine trials of type 1, ”pid’ was paired with B and C on nine trials of type ‘2’, and ‘wug’ was paired with A, B, and C on one trial of type 3. ‘exp’ was present on all trials.

	dax	pid	wug
1	0.46	-0.04	-0.25
2	-0.04	0.46	-0.25
3	-0.25	-0.25	0.5
a	0.21	-0.29	0.25
b	0.17	0.17	0
c	-0.29	0.21	0.25
exp	0.17	0.17	0

Now, as noted in the paper, it is true that ‘wug’ is indeed associated with objects A ‘dax’ and C ‘pid’ more than with B (not encountered before). But it is also the case that objects A and C are associated with ‘wug’ more than they are with the labels with which they co-occurred in training. It should therefore be the case that these objects would actually be more likely to be labeled ‘wug’ than ‘dax’ or ‘pid’, which is clearly not correct.

If instead we treat forms as cues, and meanings/objects as outcomes, it all falls into place:

	a	b	c
1	0.3	0.2	-0.2
2	-0.2	0.2	0.3
3	0.3	0.2	0.3
dax	0.3	0.2	-0.2
exp	0.4	0.6	0.4
pid	-0.2	0.2	0.3
wug	0.3	0.2	0.3

‘wug’ is A or C, ‘dax’ is A, and ‘pid’ is C.

So, mea culpa on assuming that forms would serve as cues to and compete for meanings in the Ramscar et al. model, but I think they really should!

Original cue-outcome structure:

Outcomes<-c(“a_b”, “b_c”, “a_b_c”)
Cues<-c(“1_dax_exp”,”pid_2_exp”,”wug_3_exp”)
Frequency <- c(9,9,1)

Revised cue-outcome structure:

Cues<-c(“a_b_1_exp”, “b_c_2_exp”, “a_b_c_3_exp”)
Outcomes<-c(“dax”,”pid”,”wug”)
Frequency <- c(9,9,1)

The rest of the code:

train<- data.frame(Cues, Outcomes, Frequency, stringsAsFactors=FALSE)

library(“ndl”)
round(estimateWeights(train),2)

Distributional learning and prediction error

p.140. We discovered a bug in the experimental script for Harmon et al. (2017), which has been corrected in Harmon et al. (2019, Cognition). After correcting this bug and rerunning the study, we no longer find significant evidence for greater downweighting of VOT when the distribution along the VOT continuum is bimodal compared to when it is unimodal. Participants downweight VOT when variation if F0 is predictive, whether VOT is unimodal or bimodal. We are now running follow-ups with stronger manipulations of the distribution than shown on p.141.

Missing citations

p.3. Bloomfield (1933, p.384) actually pointed out the diachronic explanation for prince becoming homophonous with prints well before Browman & Goldstein.

pp.94-97. Mediation. I have recently become aware on the literature on ‘complex rules’and ‘task sets’ in executive control (e.g., Mayr, 2002). These are also examples in which there is an additional cue (‘task set’) mediating between cue and outcome. Complex rules in this literature are essentially cue-outcome structures that require mediators.

p.193. Copying. Copying has recently been independently found to improve performance in deep recurrent network (sequence-to-sequence) models of paradigmatic morphology (‘reinflection’); Aharoni et al. (2016), Gu et al. (2016)

References:

Aharoni, R., Goldberg, Y., & Belinkov, Y. (2016). Improving sequence to sequence learning for morphological inflection generation: The BIU-MIT systems for the SIGMORPHON 2016 shared task for morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology (pp. 41-48).

Arnon, I., & Ramscar, M. (2012). Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition, 122(3), 292-305.

Bloomfield, L. (1933). Language. New York: Holt, Rinehart, and Winston.

Gu, J., Lu, Z., Li, H., & Li, V. O. K. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv:1603.06393.

Harmon, Z., Idemaru, K., & Kapatsinski, V. (2017). The power of a unimodal distribution in cue reweighting: Unimodality vs prediction error as signs of cue irrelevance. The Journal of the Acoustical Society of America, 141(5), 3520-3520.

Harmon, Z., Idemaru, K., & Kapatsinski, V. (2019). Learning mechanisms in cue reweighting. Cognition, 189, 76-88.

Mayr, U. (2002). Inhibition of action rules. Psychonomic Bulletin & Review, 9(1), 93-99.

Ramscar, M., Dye, M., & Klein, J. (2013). Children value informativity over logic in word learning. Psychological Science, 24(6), 1017-1023.

Ramscar, M., Yarlett, D., Dye, M., Denny, K., & Thorpe, K. (2010). The effects of feature‐label‐order and their implications for symbolic learning. Cognitive Science, 34(6), 909-957.

Usage-based Linguistics Laboratory

Corrections and additions to Kapatsinski, V. (2018). Changing Minds Changing Tools: From learning theory to language acquisition to language change. MIT Press.

Are forms cues to meanings?

Distributional learning and prediction error

Missing citations

References:

Member Login