Age-Dependent Statistical Learning Trajectories Reveal Differences in Information Weighting

2020

https://doi.org/10.1037/PAG0000567.SUPP

Abstract

Statistical learning (SL) is the ability to generate predictions based on probabilistic dependencies in the environment, an ability that is present throughout life. The effect of aging on SL is still unclear. Here, we explore statistical learning in healthy adults (40 younger and 40 older). The novel paradigm tracks learning trajectories and shows age-related differences in overall performance, yet similarities in learning rates. Bayesian models reveal further differences between younger and older adults in dealing with uncertainty in this probabilistic SL task. We test computational models of 3 different learning strategies: (a) Win-Stay, Lose-Shift, (b) Delta Rule Learning, (c) Information Weights to explore whether they capture age-related differences in performance and learning in the present task. A likely candidate mechanism emerges in the form of age-dependent differences in information weights, in which young adults more readily change their behavior, but also show disproportionally strong reactions toward erroneous predictions. With lower but more balanced information weights, older adults show slower behavioral adaptation but eventually arrive at more stable and accurate representations of the underlying transitional probability matrix.

Psychology and Aging © 2020 American Psychological Association 2020, Vol. 35, No. 8, 1090 –1104 ISSN: 0882-7974 http://dx.doi.org/10.1037/pag0000567 Age-Dependent Statistical Learning Trajectories Reveal Differences in Information Weighting Steffen A. Herff Shanshan Zhen and Rongjun Yu École Polytechnique Fédérale de Lausanne; Western Sydney National University of Singapore University; and Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore Kat R. Agres This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. National University of Singapore and Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore This document is copyrighted by the American Psychological Association or one of its allied publishers. Statistical learning (SL) is the ability to generate predictions based on probabilistic dependencies in the environment, an ability that is present throughout life. The effect of aging on SL is still unclear. Here, we explore statistical learning in healthy adults (40 younger and 40 older). The novel paradigm tracks learning trajectories and shows age-related differences in overall performance, yet similarities in learning rates. Bayesian models reveal further differences between younger and older adults in dealing with uncertainty in this probabilistic SL task. We test computational models of 3 different learning strategies: (a) Win-Stay, Lose-Shift, (b) Delta Rule Learning, (c) Information Weights to explore whether they capture age-related differences in performance and learning in the present task. A likely candidate mechanism emerges in the form of age-dependent differences in information weights, in which young adults more readily change their behavior, but also show disproportionally strong reactions toward erroneous predictions. With lower but more balanced information weights, older adults show slower behavioral adaptation but eventually arrive at more stable and accurate representations of the underlying transitional probability matrix. Keywords: statistical learning, cognitive assessment, continuous paradigm, age-related differences, information weights Supplemental materials: http://dx.doi.org/10.1037/pag0000567.supp Statistical learning (SL) describes the ability to generate predic- approach makes intuitive sense, as SL is already present in infancy tions based on probabilistic dependencies in the environment. The (Roseberry, Richie, Hirsh-Pasek, Golinkoff, & Shipley, 2011; Saf- majority of SL research focuses on early childhood development fran, Aslin, & Newport, 1996). SL in older adults, however, has or young adults (see Krogh, Vlach, & Johnson, 2013; Daltrozzo & received far less scientific attention. Considering the worldwide Conway, 2014 and Saffran & Kirkham, 2018 for reviews). This increase in life expectancy and age of retirement (WHO, 2015, This article was published Online First August 13, 2020. gapore Ministry of Education (MOE2016-T2-1-015) awarded to Rongjun X Steffen A. Herff, Digital and Cognitive Musicology Lab, École Yu. Polytechnique Fédérale de Lausanne; Music Cognition and Action Group, Steffen A. Herff developed the paradigm and designed, coded, as The MARCS Institute for Brain, Behaviour and Development, Western well as prepared the experiment. Kat R. Agres helped develop the Sydney University; and Department of Social and Cognitive Computing, experimental design and paradigm. Data collection was performed or Institute of High Performance Computing, Agency for Science, Technol- supervised by Steffen A. Herff and Shanshan Zhen. Data were analyzed ogy and Research, Singapore. Shanshan Zhen, Department of Psychology, and interpreted by Steffen A. Herff. The manuscript was written by National University of Singapore. Rongjun Yu, Department of Psychology Steffen A. Herff with Shanshan Zhen, Rongjun Yu, and Kat R. Agres and NUS Graduate School for Integrative Sciences and Engineering, Na- providing comments. The project idea and collaboration were initiated tional University of Singapore. X Kat R. Agres, Yong Siew Toh Conser- by Kat R. Agres and Rongjun Yu provided lab space and equipment. All vatory of Music, National University of Singapore; and Department of authors approved the final version of this article. We have no known Social and Cognitive Computing, Institute of High Performance Comput- conflict of interest to disclose. We archived a preprint of the present ing, Agency for Science, Technology and Research, Singapore. work, which can be accessed at https://psyarxiv.com/kuy6p; Herff, We thank Lauren Fairley, Jon Prince, and Estefanía Cano for construc- Zhen, Yu, and Agres (2019). tive comments on a draft, and Arihant Singhai, Ren Jie Tay, Bo Yuan, and Correspondence concerning this article should be addressed to Steffen Jing Wen Chai for their support during data collection. We thank Feng Lei A. Herff, Digital and Cognitive Musicology Lab, École Polytechnique for advice on the choice of cognitive assessment tests and organizing Fédérale de Lausanne, INN. 115, 1015 Lausanne, Switzerland. E-mail: training on administering the tests. The study was supported by the Sin- [email protected] 1090 STATISTICAL LEARNING AND INFORMATION WEIGHTS 1091 2017), it is important to further our understanding of learning in quences. The result is a TP matrix whereby each circle has a given older adults. SL can be considered the outcome of a mechanism probability to be followed by another circle. Importantly, this that extracts probabilistic information. Despite the overwhelming probability is not 100%, based on the precise sequences used, as evidence for SL in humans, the fundamental mechanisms or learn- well as the interspliced random sequences. Reaction time measures ing strategies that allow humans to extract such probabilistic revealed reduced learning in older adults compared to younger information are not yet fully understood (Krogh et al., 2013; adults. Studies that provide support for better SL in younger adults Saffran & Kirkham, 2018). Here, we investigate age-related dif- on probabilistic SL tasks tend to utilize tasks where reaction time ferences in SL ability and mechanisms in older and younger adults. (RT) is the primary measure of performance (Curran, 1997; The main contribution of the present study is showing age-related Feeney, Howard, & Howard, 2002; D. V. Howard et al., 2004; differences in SL and identifying a candidate learning mechanism J. H. Howard & Howard, 1997). This observation is important of SL that can capture the age-related differences. because differences in RT do not necessarily reflect learning performance (Aizenstein et al., 2006). Indeed, RT effects may be the result of a strategic change across age in terms of a speed– SL in Older Adults This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. accuracy trade-off (Forstmann et al., 2011; Salthouse, 1979). As a This document is copyrighted by the American Psychological Association or one of its allied publishers. There is a growing body of evidence suggesting that older adults result, paradigms that require participants to respond both quickly employ different strategies for learning, response selection, and and accurately have limited applicability for the present goal of decision-making, compared to younger adults (Hinault, Lemaire, investigating SL ability and its mechanisms, even if they do & Touron, 2017; Löckenhoff & Carstensen, 2007; Mata, von produce age-related differences (Curran, 1997). The present study Helversen, & Rieskamp, 2010; Nassar et al., 2016; Schirda, Val- also focusses on a probabilistic task, but it is concerned with entine, Aldao, & Prakash, 2016). For example, younger partici- age-related differences in predictive decision-making perfor- pants seem more proficient in combining multiple mnemonic mance, rather than the speed in which learned responses are made. strategies compared to older participants (Hinault et al., 2017). To testing accuracy rather than speed, Palmer, Hutson, and Furthermore, compared to younger adults, older participants seem Mattys (2018) presented participants with a continuous auditory to treat positive information preferentially compared to negative stream of an artificial language with no interruptions. Accordingly, information such as when recalling information about physicians the only way of extracting individual words was by tracking the or health plans (Löckenhoff & Carstensen, 2007). Older adults also underlying transitional probabilities between syllables. This is appear to utilize uncertainty of information to a lesser extent than because transitional probabilities within words are much higher younger participants, and younger adults show large behavioral than across word boundaries. After exposure to the continuous adjustments to relatively minor prediction errors (Nassar et al., stream, participants differentiated between words from the artifi- 2016). Taken together, this body of literature suggests that older cial language, nonwords (that did not exist in the artificial lan- adults utilize different strategies to extract information to make guage), and part-words (foil words generated by combining sylla- decisions or predictions compared to young adults. SL tasks lend bles across word boundaries). Although no age-related differences themselves to investigate these differences as they test behavioral occurred in differentiating words from nonwords, younger partic- outcomes that are the result of extracting statistical regularities ipants outperformed older participants when differentiating words from the environment. from part-words (Palmer et al., 2018; Palmer & Mattys, 2016). Prior research on age-related differences in SL has yielded This result suggests that older adults may utilize different strate- conflicting evidence depending on the precise paradigms (e.g., gies than younger adults to extract statistical information. How- deterministic vs. probabilistic) and measures (e.g., RTs vs. accu- ever, such differences may also be due to an age-related decline in racy) deployed. This combined with a general uncertainty about cognitive function (e.g., memory). To account for possible differ- the mechanisms behind SL leaves questions about age-related ences in cognitive function, we also collect cognitive assessment shifts in information extraction strategies largely unanswered. SL data from the participants in our experiment. of deterministic sequences (e.g., “B” always follows “A”) is re- To capture potential differences in learning strategies, we ana- markably similar across age. Cherry and Stadler (1995) presented lyze learning trajectories between groups of younger and older participants with four circles on a screen that flashed in a deter- adults. Rather than analyzing overall SL performance alone, we ministic sequence, and participants predicted the next lit circle via focus on individuals’ learning trajectories (slopes), as previous speeded button-press. Both younger and older adults performed at work suggests this measure provides valuable insight into individ- ceiling in terms of accuracy, and both age groups improved RTs uals’ cognitive capacities and the time course of learning novel with each sequence repetition. Although older adults showed over- information (Kaufman et al., 2010; Misyak, Christiansen, & Tom- all slower RTs, learning rates were comparable across the two blin, 2010; Siegelman, Bogaerts, Christiansen, & Frost, 2017). groups. Other studies also reported little evidence of age-related Learning trajectories are of particular interest for the present study differences in learning of deterministic sequences (Daltrozzo & because the time course of information integration may more Conway, 2014; Frensch & Miner, 1994; D. V. Howard & Howard, accurately characterize age-related differences than the absolute 1989, 1992; Salthouse, McGuthry, & Hambrick, 1999). However, performance. Analyzing learning trajectories can also be especially age-related differences emerge when sequences are probabilistic— informative when the underlying TP matrix contains transitions that is, governed by an underlying transitional probability (TP) where the most likely next event is by far the most probable one matrix (e.g., “B” is most likely to follow “A,” and “C” is less likely (high-certainty state), as well as transitions where the most likely to follow “A”). Curran (1997) also presented four circles to par- next event is less obvious (low-certainty state; Shafir, Reich, Tsur, ticipants; however, the sequences of flashing circles switched back Erev, & Lotem, 2008). This is because a TP matrix with various and forth between predetermined sequences and random se- different states of uncertainty allows for more precise observation 1092 HERFF, ZHEN, YU, AND AGRES of participants’ information integration, which in turn allows com- or high end of the probability spectrum (e.g., ⌬ ⫽ .2 with putational models to provide a far more detailed investigation of PPerceived ⫽ .3, PReal ⫽ .5 vs. ⌬ ⫽ .2 with PPerceived ⫽ .55, PReal ⫽ differences in participants’ learning strategies. .75)? Adjusting behavior based on the observed discrepancy be- tween a prediction and the observed reality is an intuitive and well-established learning mechanism (Rescorla & Wagner, 1972). Statistical Learning Mechanisms This learning mechanism can be described as ‘delta-rule’ learning, In the present work, we consider three learning mechanisms: (a) because response probabilities change by a proportion of the Win-Stay, Lose-Shift, (b) Delta Rule Learning based on probability prediction error (here, the difference between the predicted prob- spectrum, and (c) Information Weights. Investigating these three ability of a transition, and the real probability of a transition; learning mechanisms and exploring whether they can explain Greve, Cooper, Kaula, Anderson, & Henson, 2017). In a probabi- age-related differences in SL constitutes the main focus of the listic SL task, effective use of delta-rule learning relies on an present work. The choice of these three mechanisms was predom- estimate of one’s own perception of the transitional probabilities, inantly guided by their conceptual simplicity and presence in the as well as the true underlying probabilities. Prior literature sug- This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. literature, as well as ease of implementation. gests that younger adults, compared to older adults, show rela- This document is copyrighted by the American Psychological Association or one of its allied publishers. tively large adjustments even to small prediction errors (Nassar et al., 2016). Furthermore, the present implementation of delta-rule Win-Stay, Lose-Shift learning also allows the mechanism to vary depending on whether The first learning mechanism we consider captures whether the true probability is likely or unlikely. This implementation participants—when forming a prediction—predominantly rely on decision is motivated by findings in the literature suggesting that the outcome from their last response when faced with the same SL may differ between age groups and as a function of task decision. Such Win-Stay, Lose-Shift strategies are commonly ob- complexity (Curran, 1997; Feeney et al., 2002; D. V. Howard et served in decision-making tasks (Nowak & Sigmund, 1993; Wor- al., 2004; J. H. Howard & Howard, 1997; Palmer & Mattys, 2016). thy, Hawthorne, & Otto, 2013) and previous research indicates Given that we cannot assume that learning a less likely transition age-related differences in decision-making in terms of Win-Stay, (e.g., PReal ⫽ .5) is equally difficult as learning a more likely Lose-Shift usage. For example, older participants tended to rely transition (PReal ⫽ .75), we need this additional mechanism to more strongly on a Win-Stay, Lose-Shift mechanism in a proba- capture potential age-related differences. bilistic inference task (Mata et al., 2010). Computational modeling Observing delta-rule learning in the present probabilistic SL of Win-Stay, Lose-Shift shows that, in theory, it is an effective task would provide evidence that participants are extracting the strategy for language learning (Matsen & Nowak, 2004). Partici- underlying statistical regularities. If delta-rule learning describes pants could theoretically utilize a Win-Stay, Lose-Shift strategy to participants’ behavior in the present task, we predict younger solve a SL task. However, for any SL paradigm, predominantly adults to adjust their behavior more rapidly than older adults, based relying on such a strategy would be potentially problematic. This on previous findings (Nassar et al., 2016). Furthermore, if task is because SL tasks are often designed with the assumption that difficulty differs as a function of transitional probability, we expect participants continuously sample information from the environ- to see stronger delta-rule learning in more probable transitions. If ment to extract statistical regularities, rather than only when they this is the case, we also hypothesize an interaction with age, are prompted to respond and only from the last time they provided whereby older participants’ delta rule learning decreases to a a response. Relying on a Win-Stay, Lose-Shift mechanism would greater extent compared to younger adults as transitional certainty suggest that participants deploy a simple response heuristic to increases. This is because the increased task complexity may achieve statistical learning without extracting the full underlying function as a greater obstacle to the older adults compared to the set of transitional probabilities. This is because Win-stay, Lose- younger adults. It is worth noting that estimating the delta required Shift does not rely on extracting statistical properties—instead it for delta-rule learning can be difficult, particularly in a probabi- relies exclusively on memory of the last relevant response. In the listic task. case that we observe the Win-Stay, Lose-Shift strategy in the present task, a stronger reliance on this strategy is hypothesized in Information Weights older participants (Mata et al., 2010). In the present work, we propose a learning mechanism that is more parsimonious than the Delta-Rule model, as it does not rely Delta-Rule Learning Based on Probability Spectrum on estimating delta, and yet would be effective in extracting The second mechanism we consider aims to capture whether statistical regularities and reveal age-related differences—Infor- participants’ responses are predominantly driven by two factors. mation Weights. The model simply assesses the weights (change in First, the distance between the currently perceived transitional response probabilities) that younger and older adults attach to probabilities to the new estimated probabilities after receiving new positive (e.g., “B” follows “A”) and negative (e.g., “B” does not information (e.g., feedback). Second, the absolute values of the follow “A”) observations. Effectively, this cognitive model sim- estimated real probability. In other words, it measures whether plifies the mechanism behind statistical learning to a continuous participants more strongly adjust their predictions when the per- sampling of information with a “positive” weight that reflects ceived probabilities are further away from the true transitional increasing the likelihood of making a particular choice when the probabilities (e.g., ⌬ ⫽ .5 with PPerceived ⫽ .25 and PReal ⫽ .75 vs. specific transition is observed in the sequence, and a “negative” ⌬ ⫽ .25 with PPerceived ⫽ .5, PReal ⫽ .75), and whether this weight that reflects decreasing the likelihood of making the par- adjustment differs depending on whether it occurs toward the low ticular choice when the specific transition is not observed. As a STATISTICAL LEARNING AND INFORMATION WEIGHTS 1093 result, with only two parameters (“positive” and “negative” The Present Paradigm weight) that differ between individuals or groups, the model could Based on Siegelman, Bogaerts, and Frost’s (2017) criticism of be able to explain differences in response behavior. This mecha- existing SL paradigms, a new auditory SL paradigm that focuses nism can be understood as a generalization of Thorndike’s law of on learning trajectories was developed (Herff, Nur, Lee, Lee, & effect (Thorndike, 1898). A model of this learning mechanism Agres, 2019; Herff & Prince, 2020). In this task, participants allows the comparison of participants in regard to how willing they listened to a continuous stream of four different sounds and were are to update their response probabilities. It also allows an assess- occasionally prompted to indicate the most likely next sound. The ment of whether participants rely more on positive or negative paradigm showed high test–retest reliability in older adults (r ⫽ information. This is an interesting perspective that the previous .84), and correlated well with measures of cognitive function (r ⫽ delta-rule model cannot capture, but may be an important consid- .56). Furthermore, the task satisfies the needs outlined in the eration in the context of age-related differences in SL. Prior previous section: It is probabilistic, measures accuracy, tracks research suggests general age-related differences in the processing learning trajectories, and the TP matrix can be adjusted to contain of positive (prediction was fulfilled) and negative (prediction was low- and high-certainty transitions. The auditory domain is also a This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. not fulfilled) feedback, with older participants showing a tendency promising target to measure SL ability and link it to cognitive This document is copyrighted by the American Psychological Association or one of its allied publishers. to rely more on positive feedback compared to younger adults ability. This is because the auditory domain specializes in process- (Eppinger & Kray, 2011; Ferdinand & Kray, 2013). These studies ing stimuli that unfold in time (Pérez-González & Malmierca, provide explicit feedback, however, and do not take place within 2014) and relies heavily on extracting statistical information from the framework of SL. If participants’ response patterns can be the environment (Agres, Abdallah, & Pearce, 2018; Barascud, modeled through Information Weights and not through Win-Stay, Pearce, Griffiths, Friston, & Chait, 2016; Sohoglu & Chait, 2016). Lose-Shift, then this would suggest that participants are extracting However, similar to previous SL paradigms, many participants the underlying statistical regularities. We predict younger adults to performed at chance level, and a relatively small sample size was show higher information weights than older adults. Furthermore, used (n ⫽ 27; Herff, Nur, et al., 2019). The authors suggested based on the literature reviewed above, we predict older adults to deploying more trials and modifying the task to be multimodal. have a larger positive-to-negative weight ratio when compared to Consequently, we use Herff, Nur, et al. (2019)’s SL paradigm to younger adults. capture learning trajectories, incorporating more trials (150 instead of 50), a multimodal implementation (auditory-visual), and a new TP matrix that accommodates low- and high-certainty states. Fur- SL, Cognitive Function, and the Present Paradigm ther details of the paradigm are described in the method section. An additional consideration when looking at SL differences in older adults is the possibility that poorer performance may reflect Aim and Motivation an age-related decline in cognitive function (e.g., memory). A decline in cognitive function could lead to lower performance due In summary, the present study investigates age-dependent dif- to task-specific requirements (e.g., auditory memory) or because of ferences in SL. We utilize a continuous, multimodal, probabilistic a direct influence of cognitive function on SL. Much effort has paradigm to reveal SL trajectories in younger and older adults. The been made to investigate the relationship between SL and cogni- probabilistic TP matrix governing the task contains low- and high-certainty transitions to help us identify potential learning tive function (Feldman, Kerr, & Streissguth, 1995; Kaufman et al., strategies that capture SL and potential age-related differences. 2010; Siegelman, Bogaerts, & Frost, 2017; Siegelman, & Frost, Predominantly, we explore whether three mechanisms of learning 2015). However, despite SL’s crucial involvement across sensory can describe SL and potential age-related differences, specifically, modalities (Creel, Newport, & Aslin, 2004; Kirkham, Slemmer, & (a) Win-Stay, Lose-Shift, (b) Delta Rule Learning based on prob- Johnson, 2002; Moldwin, Schwartz, & Sussman, 2017) research ability spectrum, and (c) Information Weights. To account for the attempting to link SL to traditional cognitive assessments has potentially moderating effect of age-related differences in cogni- yielded limited evidence for a direct link between SL and cognitive tive function, we also explore whether traditional cognitive assess- function (e.g., r from ⫺.06 to.19 in Feldman et al., 1995; Kaufman ments correlate with SL performance in this task and can explain et al., 2010; Siegelman et al., 2017). In addition, previous attempts response differences between the age groups. to use SL as a measure of individual aptitude or to link it to various established measures of cognitive function have been plagued by a plethora of difficulties (Siegelman et al., 2017). These include low Method test–retest reliability (r ⫽ .44 in Kaufman et al., 2010), and low performance in the participants (21– 47% of participants at chance level, see Siegelman et al., 2017 for a review). Consequently, the General Procedure low correlations with measures of cognitive function could either After providing informed consent, participants took part in a be a product of the aforementioned methodological issues or cognitive assessment (⬃30min), followed by the SL paradigm indeed indicative that SL is mostly independent of other cognitive (⬃45min). The present data collection was part of a large EEG skills. To capture cognitive function as a possible covariate and project collaboration between the Agency for Science, Technology test its contribution to the question of whether SL is directly and Research (AⴱSTAR) and the National University of Singapore influenced by cognitive function, we administer a battery of cog- (NUS). Analysis of the collected EEG data will be reported else- nitive assessments, further described in the method section. where. 1094 HERFF, ZHEN, YU, AND AGRES Participants Figure 1) are considered high-certainty states, as the most likely next state is evident with a 75% transitional probability. The other Data from 40 younger adults were recorded from the student two states (“B,” “C,” blue in Figure 1) are low-certainty states, as population at the National University of Singapore (Mage ⫽ 21.4 the most likely next state is less evident with only a 50% transi- SDage ⫽ 2.7); 40 older adults (defined as 60 ⫹ years old) were tional probability. For example, the most likely state after A is B, recruited from the community (Mage ⫽ 66.7, SDage ⫽ 4.2). Par- with a probability of 75%. The most likely state after “B” is “D,” ticipants were required to have normal or corrected-to-normal with a probability of only 50%. The probability of repeating a state hearing, be literate in English, be able to provide informed consent, is zero, thus a response indicating repetition is considered a rule and be able to travel to the study site independently. Participation violation. Cumulative Rule Violations (CRV) as well as Cumula- was reimbursed with SGD 40. The study was IRB approved tive High Probability Pathway Choices (CHPC) are the two mea- (S-17–372). surements of SL performance used here. CRV refers to the number of rule violations (responses indicating a repetition; red arrows in Stimuli and Equipment Figure 1) accumulated up to a given trial. CHPC refers to the This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Statistical learning paradigm. The present study deployed a number of high-probability responses (responses correctly identi- This document is copyrighted by the American Psychological Association or one of its allied publishers. continuous SL paradigm designed to capture learning trajectories fying the most likely next state; green arrows in Figure 1) accu- (Herff, Nur, et al., 2019). A long series of states was generated mulated up to a given trial. Thus, good performance is indicated by whereby each state could be one of four options. The four states high CHPC and low CRV (Herff, Nur, et al., 2019). Note that were differentiated through sound (sine waves at 165 Hz (E3), 220 while likely correlated, the two measures assess different aspects Hz (A3), 294 Hz (D4), 392 Hz (G4), each 500ms in duration). of SL, as CHPC captures whether participants learn the most likely Participants heard this long series of four possible states, and the next event, and CRV assesses whether participants learn to exclude series paused every 7.5 to 11.5 s (15–23 tones), at which point the impossible outcomes. In other words, on a given trial, participants participants were prompted to indicate which tone they thought may avoid a rule violation but still not pick the high-probability would occur next. The number of tones between stopping points choice—there are two other low-probability choices (black arrows was variable to avoid potential expectancy effects of when the next in Figure 1). Similarly, participants may not make the correct high interruption would occur. After a response, the sequence would probability choice, but that does not necessarily mean that they continue. The sequence was instantiated in both the auditory and chose a rule violation (repetition). visual modality. Four horizontally aligned circles on the screen were associated with the four sounds (in order of lowest to highest Cognitive Assessment pitch, left to right). For each tone, a circle flashed as the respective sound was played. After each stop in the sequence, participants A battery of cognitive tests was administered to prevent any indicated their response by clicking on the circle that they thought confounding of group differences in general cognitive ability with would occur next (four alternative forced-choice). The response SL learning trajectories. Furthermore, prior research has shown window was not timed. Participants did not receive explicit feed- conflicting evidence as to whether SL is predicted by cognitive back, however, since the sequence continued after each response, ability (Feldman et al., 1995; Herff, Nur, et al., 2019; Kaufman et feedback was implicitly provided by the following state and al., 2010; Siegelman et al., 2017). Though not the main focus of whether or not it matched the participants’ prediction. In total, 150 this study, we hope that collecting cognitive ability data in addition responses (trials) per participant were collected. to the present SL paradigm may also contribute to the debate. The Transitional probability matrix. The TP matrix governing selection of cognitive tests was informed by consulting a clinician the four states can be seen in Figure 1. The overall probability of specialized in working with auditory learning tasks in an older each state is identical (25%). Two states (“A,” “D,” purple in adult population (see Feng et al., 2017; Tan et al., 2018 for studies utilizing the same battery of tests). The tests aim to provide an overview of cognitive ability that may be relevant to auditory learning tasks in general. Specifically, the battery of cognitive tests deployed here com- prises the Rey Auditory Verbal Learning Test (RAVLT; Rey, 1958), Digit Span task (backward and forward), Verbal Fluency task (see Randolph, Braun, Goldberg, & Chase, 1993), Symbol Digit Modality Test (Smith, 1982) in written (DSW) and verbal (DSV) form, and Color Trails Test (D’Elia, Satz, Uchiyama, & White, 1996). All assessors were formally trained and the tests were administered as described in the Neuropsychological Assess- Figure 1. Schematic representation of the TP matrix. The two main ments Training Manual for Assessors (Yu, 2018). A short sum- measures of SL performed used here are Cumulative Rule Violations mary of each test follows below. (CRV, accumulation of response associated with a red arrow) and Cumu- RAVLT. The test comprises multiple parts. In part one, par- lative High Probability Choices (CHPC, accumulation of responses asso- ciated with a green arrow). Because the most likely next state is clearer ticipants listen to a list of 15 words (List-A) and then attempts to (75%, purple) in state “A” and “D” compared to states “B” and “C” (50%, recall them. This procedure is repeated five times, and the number blue), states A and D are considered high-certainty states, and states B and of correct recalls is counted after each iteration. In the models, this C are considered low-certainty states. See the online article for the color is coded as RAVLT1 to RAVLT5. In the second part, the participant version of this figure. listens to a different 15-item word list (List-B), and the number of STATISTICAL LEARNING AND INFORMATION WEIGHTS 1095 correctly recalled items is coded as RAVLTB. Afterward, partici- Statistical Learning, Age, and Certainty pants are asked to recall the items from List-A again, and the A total of 12,000 responses were collected, evenly distributed number of correctly recalled items is coded as RAVLTRECA. After across the four states (A ⫽ 25.57%, B ⫽ 25.92%, C ⫽ 24.02%, a delay, filled with the Digit Span Test and Color Trail test (see D ⫽ 24.48%). We used a simulation-based approach to assess below), the RAVLT assesses delayed recall by requiring partici- chance and ideal performance (see Supplement S0). 95% CIs were pants to recall the items of List-A once more. The number of calculated around simulated guessing participants and simulated correctly recalled items is coded as RAVLTDelayedRacall. In the ideal Bayesian learners. The results are summarized in Table 1, third part of the RAVLT, participants listen to a list of 50 items, 15 and Figure 2 depicts overall learning trajectories. of which were in List-B, and the participants aim to identify words A generalized Bayesian mixed-effects model predicted the re- that have been presented before. The number of correctly recog- sponses that lie on the high-probability pathway. The model was nized words is coded in the models as RAVLTRecognition. This provided with a fixed effect for Trial (1–150, representing the RAVLT assesses verbal memory in terms of recognition as well as learning trajectory over the course of the experiment), Age recall. (younger adults vs. older adults), Certainty (low-certainty state vs. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Digit span task. This task consists of two parts. In the first high-certainty state), as well as all interactions. The model was This document is copyrighted by the American Psychological Association or one of its allied publishers. part, participants are asked to listen to short sequences of numbers also provided with random effects for Participant and the precise and repeat them verbally. The task consists of two items for each Sequence presented. Further information about the models can be sequence length. If both sequences are not correctly repeated, the found in Supplement S1. We report coefficient estimates (␤), task stops, and the total number of correctly recalled strings is estimated error (EE) in the coefficients, as well as evidence (Odds) coded as DigitSpanFWD. Afterward, the same task is repeated ratios for the individual hypotheses (a given coefficient being with different numbers. This time, however, participants are re- larger or smaller than zero). For convenience, we denote effects quired to repeat the numbers backward. The number of correctly with “ⴱ” as those which can be considered ‘significant’ at an ␣ ⫽ recalled sequences is coded as DigitSpanBWD. The Digit Span .05 level. This corresponds to odds ratios ⬎ ⫽ 19 (odds 95/5 ⫽ 19; tasks assess working memory capacity. Milne & Herff, 2020). Trial (␤ ⫽ .14, EE␤ ⫽ .05, Odds(␤ ⬎ 0) ⫽ 579.65ⴱ) predicted Color trails test. The test consists of two parts. In part one, the probability of high-probability pathway responses, indicating participants connect numbered circles in ascending order on a sheet of that learning took place. Age (␤ ⫽ ⫺.31, EE␤ ⫽ .09, Odds(␤ ⬍ paper. In the second part, participants connect numbers and letters, by 0) ⬎ 9999ⴱ) also carried predictive value, with younger adults alternating between numbers (in ascending order) and letters (in overall being more likely to produce high-probability pathway alphabetic order). The test assesses visual attention and task switch- responses. The low-certainty states led to overall fewer high- ing capability. Time to completion is measured separately for the probability pathway responses (␤ ⫽ ⫺.61, EE ␤ ⫽ .07, Odds(␤ ⬍ two parts, and both are included in the models, referred to as 0) ⬎ 9999ⴱ), indicating that participants were able to discern the ColorTrail1 and ColorTrail2. differences between states in the TPs. The LowCertainty ⫻ Trial Verbal fluency task. This task requires participants to name interaction (␤ ⫽ ⫺.31, EE␤ ⫽ .07, Odds(␤ ⬍ 0) ⬎ 9999ⴱ) as many animals as possible in 60 s. The number of different predicted reduced high-probability pathway responses in low- animal names is coded as SemanticFluencyAnimals in the models. certainty states as the experiment progresses. This can be seen in The test assesses linguistic storage and retrieval. Figure 3 in the positive slope for the high-certainty states, and the Symbol digit modality test. In the first part, participants are negative slope for the low-certainty states. The Trial ⫻ Cer- provided with a visual key that links the numbers 1 to 9 to nine tainty ⫻ Age interaction (␤ ⫽ .15, EE␤ ⫽ .05, Odds(␤ ⬎ 0) ⫽ different visual symbols. Participants then have 90 s to transcribe 733.69ⴱ) showed that (as the experiment progresses) younger a list of symbols as their matching number. The number of cor- adults’ likelihood to produce high-probability pathway responses rectly linked symbols is coded as DigitSymbolWritten. In the decreases more strongly in the low-certainty states compared to second part, participants are provided with a new response sheet older adults. Figure 3 depicts this finding—the blue line (low- and repeat the task; however, this time they speak the number certainty state) has a steeper slope for younger adults (left panel) aloud, rather than writing it on the sheet. The number of correctly than older adults. Importantly, the Trial ⫻ Age interaction did not linked symbols in the second part is coded as DigitSymbolVerbal. carry predictive value (␤ ⫽ ⫺.03, EE␤ ⫽ .03, Odds(␤ ⬍ 0) ⫽ The tests assess association memory, divided attention, and visual scanning. Table 1 SL Performance Summary Results More than Less than Ideal chance chance performance The results are structured in three parts. First, we report overall Age group N in CHPC in CRV range SL performance in both age groups and how they differ between Younger Adults 40 36 38 15 low- and high-certainty states. Then, we attempt to model the Older Adults 40 32 32 7 results through three learning mechanisms: (a) Win-Stay, Lose- Shift, (b) Delta Rule Learning based on probability spectrum, and Note. SL ⫽ statistical learning. More than chance in Cumulative High Probability Pathway Choices (CHPC), and less than chance in Cumulative (c) Information Weights. Finally, we explore the relationship be- Rule Violations (CRV) indicate successful learning of the transitional tween the battery of cognitive assessments and SL performance. probability (TP) matrix. 1096 HERFF, ZHEN, YU, AND AGRES Younger Adults − CHPC Older Adults − CHPC 90 90 Chance CHPC CHPC 60 60 Chance−95%CI Ideal−95%CI 30 30 0 0 0 50 100 150 0 50 100 150 Trial Trial This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Younger Adults − CRV Older Adults − CRV This document is copyrighted by the American Psychological Association or one of its allied publishers. 50 50 40 40 30 30 CRV CRV 20 20 10 10 0 0 0 50 100 150 0 50 100 150 Trial Trial Figure 2. Overall performance in the SL task. The top row shows Cumulative High-Probability Choices (CHPC). The bottom row shows Cumulative Rule Violations (CRV). The left column shows data from younger adults, and the right column shows data from older adults. Each thin line represents one participant. The bold solid lines represent chance performance. Above chance in CHPC, and below chance in CRV, indicates good performance. The dotted line shows a theoretical ideal performer. The gray bands represent 95% CIs around chance and ideal performance. 3.88). This means that learning trajectories in high-certainty states prediction. We did not find evidence for the LastPredHPP ⫻ were comparable between the two age groups, as shown in Figure LastPredCorrect interaction, suggesting participants did not pre- 3 (the red lines, depicting high-certainty states, have similar slopes dominantly rely on the information of their last prediction across age groups). (␤ ⫽ ⫺.05, EE␤ ⫽ .15, Odds(␤ ⬍ 0) ⫽ 1.80). Low evidence for For CRVs, we combined the data from low- and high-certainty the LastPredHPP ⫻ LastPredCorrect ⫻ OlderAdult interaction states, as both have 0% TPs of repeating states. Age (␤ ⫽ .34, shows that this behavior did also not differ between age groups EE␤ ⫽ .14, Odds(␤ ⬎ 0) ⫽ 136.40ⴱ) predicted the probability of (␤ ⫽ - .25, EE␤ ⫽ .20, Odds(␤ ⬍ 0) ⫽ 8.88), and therefore does rule violations, with older adults (M ⫽ 0.0972, SD ⫽ 0.2963) on not explain the age-dependent behavior toward low-certainty states average showing more rule violations than younger adults (M ⫽ (see Supplement S3.1 for the full model). 0.0463, SD ⫽ 0.2102). Both Trial (␤ ⫽ ⫺.03, EE␤ ⫽ .03 Model 2: Delta Rule Learning Based on Probability Odds(␤ ⬎ 0) ⫽ 5.61) as well as the Trial ⫻ Age interaction (␤ ⫽ Spectrum. Both age groups deployed a learning mechanism .1, EE␤ ⫽ .04, Odds(␤ ⬍ 0) ⫽ 1.60) did not show an effect. This whereby they adjusted their behavior more strongly for larger is most likely because of the small number of rule violations (see errors as captured by strong evidence for the ActualMinusRespon- Supplement S1 for a summary and the risk ratios of the SL, age, seProbs coefficient (␤ ⫽ 2.20, EE␤ ⫽ .18, Odds(␤ ⬎ 0) ⬎ 9999ⴱ). and certainty models). Due to the overall smaller degree of vari- The ActualMinusResponseProbs ⫻ OlderAdult interaction term ability in the CRV data, the following models focus on CHPC. reveals that this adjustment was larger in the young adults than the older adults (␤ ⫽ ⫺.71, EE␤ ⫽ .24, Odds(␤ ⬎ 0) ⫽ 733.69ⴱ). Statistical Learning Mechanisms Evidence for the ActualMinusResponseProbs ⫻ StateSpecificRe- To further explore the cognitive basis of age-related differences ponseProbs interaction shows that both groups also adjusted their in SL, we tested three cognitive models. In particular, we hoped to behavior depending on where in the probability spectrum the reveal a mechanism that captures the age-related differences in SL incongruence between believed and real probability occurs, with of low- and high-certainty states (see Figure 3). stronger behavioral changes toward the higher end (␤ ⫽ .43, Model 1: Win-Stay, Lose-Shift Strategy. The first model EE␤ ⫽ .24, Odds(␤ ⬎ 0) ⫽ 25.47ⴱ). However, we found no assessed whether participants predominantly used the outcome evidence that this incongruency mechanism differs between the from their previous response to the same state when forming a age groups in the ActualMinusResponseProbs ⫻ StateSpecificRe- STATISTICAL LEARNING AND INFORMATION WEIGHTS 1097 Younger Adults Older Adults 0.8 Probability of High Probability Pathway Response 0.6 Certainty High Low This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 0.4 This document is copyrighted by the American Psychological Association or one of its allied publishers. 0.2 1 50 100 150 1 50 100 150 Trial Figure 3. Effects of age and certainty state on SL as measured by the probability of producing a response compatible with the high-probability pathway. Both age groups show clear learning trajectories. Younger adults show a higher intercept at the beginning of the experiment compared to older participants. Learning trajectories (slopes) are comparable between the two age groups on high-certainty states (red lines). Interestingly, both groups appear to underestimate the probability of the most likely response in the low-certainty states (blue lines). This is particularly pronounced in the younger adults, who, for low-certainty states, produced increasingly fewer responses over the course of the experiment that lie on the high-probability pathway. The bands indicate 95%CIs. See the online article for the color version of this figure. ponseProbs ⫻ OlderAdult interaction term (␤ ⫽ ⫺.34, EE␤ ⫽ .32, mation weights. The divergence across age group (DKL(PD- Odds(␤ ⬍ 0) ⫽ 5.84). As a result, this model does not explain the FOlderAdults || PDFYoungerAdults) ⫽ 3.1054) is substantially larger age-dependent differences in low-certainty responses shown in compared to the Kullback-Leibler divergence distribution Figure 3 either (see Supplement S3.2 for the full model). obtained from 10,000 random permutations of the Age group Model 3: Information weights. The third model is a parsi- vector (DKL-Mean(PDFGroupA || PDFGroupB) ⫽ .00012, monious explanation and simply assesses the weights that younger DKL-SD(PDFGroupA || PDFGroupB) ⫽ .00008). In summary, we and older adults attach to positive (e.g., “B” follows “A”) and found strong support that the younger and older adult cohorts negative (e.g., “B” does not follow “A”) observations. Because the operate on different information weights. This can also be seen in Bayesian models provide slope coefficients of behavioral change Figure 5. in both age groups at two different transitional probabilities for the high-probability pathway, we have two equations for each age group, each with two unknowns. As a result, we can use Gaussian elimination SL and Cognitive Ability (see Supplement S3.3) to obtain the weights of older adults (Positive- Figure 6 provides an overview of the magnitudes of the corre- WeightOlderAdult ⫽ .27, NegativeWeightOlderAdult ⫽ ⫺.37) and younger adults (PositiveWeightYoungerAdult ⫽ 45, Negative- lation values between SL as measured by CHPC and CRV by the WeightYoungerAdult ⫽ ⫺.79) attached to the continued sampling of pos- end of the experiment, and all cognitive assessments conducted. itive and negative observations in a simplified decision-making model. The dendrogram is the result of hierarchical clustering of these The resulting weights are seen in Figure 4. magnitudes. Supplement S2 contains the full correlation matrix. To obtain the distribution of weights in Figure 4, the information Figure 6 shows that SL and most cognitive assessments tend to weights for both groups were calculated after each iteration of the be clustered in two distinct groups of measurement. This, com- Bayesian Model. Since the model ran on 10,000 iterations, with bined with the overall low correlations (all r ⬍ .33, see Supple- 1000 warmups on four cores, Figure 4 uses the data of a total of ment S2), points toward SL being distinct to the construct targeted 36,000 posterior distributions. A Hotelling T2 test using 10,000 by most cognitive assessment tests. However, this does not ex- permutations shows a significant difference between the distribu- clude the possibility that there are individual cognitive assessments tion of information weights in older adults from that of younger that relate to SL. To address this as well as the small participants- adults (t2(2,71997) ⫽ 112447.7, p ⫽ ⬍ .0001). Further support to-predictors ratio, a stepwise regression (both-ways, ⌬BIC pen- was found by calculating Kullback-Leibler divergence on the alty term) was performed to reveal the best predictors for CHPC probability density functions of younger and older adults’ infor- and CRV. For CHPC, RAVLT1 was the only remaining predictor, 1098 HERFF, ZHEN, YU, AND AGRES −1.5 −1.0 Negative Weight Age Older Adults Younger Adults −0.5 This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. This document is copyrighted by the American Psychological Association or one of its allied publishers. 0.0 0.00 0.25 0.50 0.75 1.00 Positive Weight Figure 4. Estimated weights distribution to positive and negative observations in both age groups. Positive weights indicate predicted change toward providing a given answer, after observing a transition suggesting this answer (the positive number indicates that the probability increases). Negative weights indicate predicted change away from providing a given answer, after observing a transition which suggests that this is not the answer (the negative number indicates that the probability decreases). Both groups show clear signs of learning by using both positive and negative observations. This is indicated by the nonzero weights on both axes for both groups, and by the fact that in both groups, positive weights all fall within the range of positive numbers (increase in probability to provide the response), and negative weights all fall within the range of negative numbers (decrease in probability to provide the response). Younger adults show larger sways in their predictions as shown by the larger weights on both axes compared to older adults. Although both younger and older adults weight negative observations more strongly than positive, this is substantially more pronounced in the younger adults group. See the online article for the color version of this figure. and for CRV, the DigitSymbolWritten test was the only surviving underlying statistical structure in the task. Both age groups showed predictor. similar learning trajectories of the most likely next event when the Consequently, we deployed linear Bayesian mixed effects mod- transition was likely (high certainty). When it came to dealing with els predicting CRV and CHPC scores. The models were provided less certain transitional probabilities, learning trajectories diverged with a fixed factor for Age, Trial, as well as the RAVLT1 and between age groups. To explain these findings, we tested three DigitSymbolWritten scores. All interaction terms were fully pa- cognitive models. We found that younger and older adults utilize rameterized, with the exception of RAVLT1 and DigitSymbolWrit- similar strategies, but younger adults are more willing to change ten interaction terms, as they are of no interest to the present their behavior by placing strong weight on negative observations design. We found that for both cognitive assessments, Trial ⫻ during their decision-making process. In addition, scores on some RAVLT1 (␤ ⫽ 1.01, EE␤ ⫽ .09, Odds(␤ ⬎ 0) ⫽ ⬎ 9999ⴱ) and traditional cognitive assessments were found to mediate perfor- Trial ⫻ DigitSymbolWritten (␤ ⫽ .74, EE␤ ⫽ .10, Odds(␤ ⬎ mance. This effect was stronger in older adults but did not explain 0) ⫽ ⬎ 9999ⴱ), larger scores predicted steeper statistical learning the age-related response pattern in the present task and may only trajectories. Furthermore, the Trial ⫻ RAVLT1 ⫻ OlderAdult (␤ ⫽ be indicative of task-specific demands. The main contribution of .78, EE␤ ⫽ .12, Odds(␤ ⬎ 0) ⫽ ⬎ 9999ⴱ) as well as Trial ⫻ this study is demonstrating age-related differences in statistical DSW ⫻ OlderAdult (␤ ⫽ .40, EE␤ ⫽ .15, Odds(␤ ⬎ 0) ⫽ 231.26ⴱ) learning which can be modeled through systematic shifts in infor- interaction terms showed that these effects are stronger in older mation weights when sampling information. adults compared to younger. This can also be seen in Figure 7 in the larger difference between the two colored lines in older adults SL Performance and Age compared to younger adults (see Supplement S2 for the full models). Many SL paradigms suffer from overall low performance (Sieg- elman et al., 2017). Following previous suggestions (Herff, Nur, et al., 2019), we deployed a large number of trials and multimodal Discussion stimuli and found clear signs of learning in the majority of partic- We investigated differences in SL trajectories between younger ipants in both age groups. Overall, more young adults learned the and older adults, and the extent to which both groups learned the most likely next event (CHPC) and approximated ideal perfor- STATISTICAL LEARNING AND INFORMATION WEIGHTS 1099 1.0 density 0.5 This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. This document is copyrighted by the American Psychological Association or one of its allied publishers. 0.0 5 4 3 2 1 10 10 10 10 10 100 Figure 5. Kullback-Leibler divergence between the probability density functions of the information weights of younger and older adults. The dotted red line indicates the Kullback-Leibler divergence observed between the information weights of the younger and older adults in the present study. The distribution in black can be used to assess divergence values that could occur by chance. The large distance on a log-scale between the dotted red line and the chance distributions supports that younger and older adults deploy different information weights. The distribution was obtained by 10,000 iterations of calculating the divergence after shuffling the Age group vector. The x-axis is log scaled. See the online article for the color version of this figure. mance compared to older adults. This is in line with previous unlikely transitions. However, if transitional probabilities only studies that also showed an age-related decline in SL of probabi- impacted task difficulty, then the prior literature would suggest listic stimuli (Curran, 1997; Feeney et al., 2002; J. H. Howard & that younger adults should outperform older adults on low- Howard, 1997). Furthermore, more older adults failed to learn that certainty states (Curran, 1997; Feeney et al., 2002; D. V. Howard immediate state repetitions (CRV) were impossible. This is an et al., 2004; J. H. Howard & Howard, 1997; Palmer & Mattys, interesting observation that requires further exploration in the 2016). We did not observe this pattern in the present data. As a future, as presently CRV did not provide enough variability to be result, we conclude that transitional likelihood cannot be used as a effectively modeled and the analysis focused on CHPC instead. direct proxy to task difficulty. That is not to say that difficulty does Trial-wise analysis of CHPC revealed that younger adults show not vary with transitional likelihood, as the current design cannot more high-probability responses initially, but the learning trajec- exclude this possibility. However, the present results strongly tories over time are comparable between the groups. This could be indicate that transitional likelihood has other profound impacts on indicative of a more conservative strategy deployed by older adults learning, beyond a potential impact on task difficulty. initially, such as a stronger ‘prior’ inclination toward equiprobable Within the high-certainty states, learning trajectories between responses in the beginning. Participants’ behavior in light of likely the two age groups were not significantly different from one and unlikely transitions reveal further insight. another. However, when faced with less certain transitional prob- Certainty states. In any probabilistic scenario, it is important abilities, the response pattern in older adults stayed relatively to consider that the relationship between the probability of an constant and close to the actual underlying transitional probabili- outcome and an individual’s predictions or decisions may not be a ties throughout the experiment. Conversely, younger adults linear one. Indeed, examples of where observed probabilities and showed an initial strong tendency toward the most likely event, the resulting predictions or decisions have a distinctly nonlinear followed by a rapid decay in their likelihood of responding with relationship are well-documented (see Barberis, 2013 for a re- the next most likely state (see Figure 3). At first glance, this view). In the present paradigm, we were able to observe younger observation is somewhat startling. Why should younger adults drift and older adults’ behavior when dealing with likely or unlikely away from the true underlying probability, when they were per- transitions, as the present paradigm contains both low- and high- fectly capable of identifying it in the beginning? A key difference certainty states. Here, we consider that low and high certainty may between the likely and unlikely transitions is that for unlikely function as a proxy to task difficulty. This is because it is possible transitions, a correct prediction (e.g., “B” will follow “A”) may that probabilities of unlikely transitions are more difficult to ex- often not be realized, even though it may be the most likely tract, compared to likely transitions. Indeed, the present results transition (e.g., of all options “B” is the most likely one to follow suggest that learning of likely transitions is faster compared to “A”). If younger adults showed a strong adverse reaction (e.g., 1100 HERFF, ZHEN, YU, AND AGRES RAVLTRECA RAVLTDelayedRecall RAVLT4 RAVLT3 RAVLT5 RAVLTRecognition RAVLT1 RAVLT2 DigiSymbolVerbal DigiSymbolWritten This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. ColourTrail2 This document is copyrighted by the American Psychological Association or one of its allied publishers. ColourTrail1 SemanticFluencyAnimals RAVLTB CHPC CRV DigitSpanBWD DigitSpanFWD RAVLTB DigitSpanFWD DigitSpanBWD CRV ColourTrail1 ColourTrail2 DigiSymbolVerbal RAVLT2 CHPC SemanticFluencyAnimals DigiSymbolWritten RAVLT1 RAVLTRecognition RAVLT5 RAVLT3 RAVLT4 RAVLTDelayedRecall RAVLTRECA Color Key 0.2 0.6 1 Value Figure 6. Hierarchical clustering of the magnitudes of the correlation coefficients of SL and all cognitive assessments. Even though both digit span tests are clustered the closest to Cumulative Rule Violations (CRV) and Cumulative High-Probability Choices (CHPC), a stepwise regression revealed that RAVLT1 and DigitSym- bolWritten carry the most predictive value for SL. The black lines highlight the cells related to CRV and CHPC. RAVLT ⫽ Rey Auditory Verbal Learning Test; BWD ⫽ backward; FWD ⫽ forward. See the online article for the color version of this figure. frustration) to negative observations (“B” did not follow “A,” implicit feedback could be performed reasonably well with a despite the perception that “B” is the most likely), then this would Win-Stay, Lose-Shift strategy (e.g., Matsen & Nowak, 2004). If a conceptually capture this pattern of results. We will return to this SL task can theoretically be achieved with strategies that do not point with a more formal explanation when discussing the results extract the underlying statistical dependencies, then additional of the Information Weights model. This is because the way par- steps need to be taken when interpreting the ability to perform a SL ticipants react to various degrees of certainty can be indicative of task as evidence that the learners extracted the underlying TP the underlying learning mechanisms used. Here, we tested three matrix. The fact that both groups did not utilize a Win-Stay, cognitive models to further explore the learning mechanisms un- Lose-Shift strategy lends strength to the present paradigm and the derlying SL, as well as age-related differences. results. Learning mechanisms. We found evidence that both age Delta-rule learning describes the results within both age groups groups draw information from the continuous sequence, rather well. By adjusting their behavior more strongly the further their than only from the last time they provided a response specific to own beliefs differ from the actual underlying probabilities, partic- the current state. That is, neither age group utilizes a Win-Stay, ipants were better able to perform the task. As predicted, younger Lose-Shift Strategy. This is an important observation, as reliance adults do this to a greater extent than older adults. The second on Win-Stay, Lose-Shift would have suggested that participants hypothesis about delta-rule learning was also confirmed: partici- were not extracting the full underlying TP matrix, but instead only pants were more willing to adjust their behavior at the higher end rely on a simple response heuristic to perform the SL task. Theo- of the probability spectrum. Based on present results, delta-rule retically, many probabilistic SL tasks that provide explicit or learning could be a crucial mechanism involved in statistical STATISTICAL LEARNING AND INFORMATION WEIGHTS 1101 Younger Adults Older Adults Younger Adults Older Adults 100 100 75 75 RAVLT1 DSW CHPC CHPC 50 High 50 High Low Low This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 25 25 This document is copyrighted by the American Psychological Association or one of its allied publishers. 0 0 1 50 100 150 1 50 100 150 1 50 100 150 1 50 100 150 Trial Trial Figure 7. Marginal effects plots of Age, SL, and cognitive assessment scores. Both age groups show higher predicted Cumulative High-Probability Choices (CHPC) values with high RAVLT1 (2.14) and high DSW (2.64, red lines) compared to low RAVLT1 (⫺2.35) and low DSW scores (⫺2.08, blue lines). The larger distances between the red and the blue lines in older adults compared to the younger adults visualizes the three-way interaction. The bands represent 95% CIs. RAVLT ⫽ Rey Auditory Verbal Learning Test; DSW ⫽ symbol digit modality test in written modality. See the online article for the color version of this figure. learning, and it could explain some of the age-related differences rapid discarding of impossible or unlikely—and therefore unreli- observed. However, the hypothesized interaction in delta-rule able— outcomes. However, it would also lead to a greater shift learning between transitional likelihood and age was not observed. away from the true underlying transitional probabilities. The de- As a result, age-related differences in delta-rule learning cannot crease over time of high-probability choices in low-certainty states explain the age-related differences observed in the low-certainty in the present study could be an example of this possibility. responses discussed in the previous section. The information Interestingly, the lower but more balanced weights in older adults, weights model, on the other hand, can. in the long run, would yield more accurate yet slower behavioral There is ample evidence that correct predictions are intimately changes. This fits the general observation that older adults weight tied with internally generated rewards (Fiser, Berkes, Orbán, & accuracy over speed (Forstmann et al., 2011; Salthouse, 1979). Lengyel, 2010), which increase the probability of the same pre- The information weights perspective also integrates well with diction in the future, similar to a Bayesian observer. However, the previous findings. Nassar et al. (2016) found large behavioral decrease in probability caused by a negative observation (“B” does adjustments to relatively minor predictions errors in younger not follow “A”) may not be identical to the increase in probability adults, but not older adults. This observation could be well- caused by a positive observation (“B” does follow “A”). With the described by younger adults placing larger weights on negative data collected here, we were able to calculate the weights that observations, as the present information weights model revealed. younger and older adults attach to positive and negative transi- A potential explanation for the age-related shift in information tional observations. We find that information weights offer a weights may be provided by socioemotional selectivity theory. parsimonious mechanistic description that captures the present Socioemotional selectivity theory posits that goal-directed be- results well. As hypothesized, younger adults attached larger havior is strongly influenced by an individual’s perspective on weights to both types of observations compared to older adults, time (Carstensen, 1992, 1995; Carstensen, Fung, & Charles, 2003). which could explain why younger adults initially show faster Specifically, when time is perceived as open-ended, expensive and behavioral changes. Most importantly, younger adults strongly potentially risky long-term goals are considered. However, when weight the information of negative observations over positive ones time is perceived as being limited, greater importance is put on the when it comes to formulating future predictions. Older adults also present, for example, by prioritizing emotional wellbeing and rely on negative information more than on positive, but to a stability. As age progresses, individuals tend to perceive time as substantially lesser extent than younger participants. As a result, passing faster, often paired with an increasing confrontation with our second hypothesis about information weights is supported. It is one’s own mortality. In the current study, the balanced information important to note that older participants here appear close to weights utilized by the older adults result in a slower change of equiweighting for positive and negative observations. behavior that is much less prone to dramatic shifts in behavior, and Overweighting negative observations, as younger participants would eventually arrive at a stable homeostatic state with a re- did, appears sensible from an evolutionary perspective, as it allows sponse distribution that mirrors the precise underlying transitional 1102 HERFF, ZHEN, YU, AND AGRES probabilities. As behavioral change and inaccurate decisions are requirements (e.g., auditory tracking) instead of a direct influence more expensive for individuals that perceive time as “running- of cognitive ability on SL (Feldman et al., 1995; Herff, Nur, et al., out,” the balanced approach to weighting incoming information 2019; Kaufman et al., 2010; Siegelman et al., 2017). This is in line would be the most rational choice for older individuals, rather than with previous literature that suggests that SL and general cognitive deploying a weighting that prioritizes quick yet imprecise adapta- function are largely independent (Feldman et al., 1995; Kaufman tion (as embraced by the younger adults). et al., 2010; Siegelman et al., 2017). Based on the present results, Establishing individuals’ information weights could be a useful it seems unlikely that differences in cognitive function are what tool for customizing and optimizing learning. Specifically, it drives the age-related differences in SL observed here. seems that as age progresses, positive observations (“B” follows “A”) become more important for learning than negative observa- Conclusion tions. This finding could be relevant for an aging workforce that is required to adapt and learn new skills (WHO, 2015, 2017). How- The paradigm deployed here tracked learning trajectories and ever, it is important to note that the present study cannot distin- revealed differences between younger and older adults in SL when This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. guish “age” from “education” because the between-subjects nature it comes to dealing with uncertainty. A possible explanation was This document is copyrighted by the American Psychological Association or one of its allied publishers. of the design means that age-related differences may be a feature found in the form of age-dependent differences in information of aging, or a result of different upbringings between the two weighting, in which younger adults more readily adjust their generations. In both cases, the information weights perspective behavior, but also weight negative observations (e.g., “B” does not may be useful as awareness of information weights may help in follow “A”) more strongly than positive observations (e.g., “B” understanding the judgments made by oneself and others. It is does follow “A”) compared to older adults. The weights deployed apparent in the present results that different information weights by younger adults favor rapid behavioral adaptation, whereas the can lead to different decisions at different timepoints. As a result, weights used by older adults favor more precise behavioral adap- expressing decisions as a function of the information weights may tation over time. We hope that future research using this paradigm help in bridging opposing judgments. Furthermore, exploring an will provide precise estimates of individuals’ information weight- information weights perspective could also be useful in deepening ing of positive and negative predictive outcomes. our understanding of mental disorders that can be understood as information filters (e.g., depression, see Gaddy & Ingram, 2014). For example, depression could be characterized as elevated nega- References tive— or reduced positive—information weights. This question Agres, K., Abdallah, S., & Pearce, M. (2018). Information-theoretic prop- represents a promising area for future research. erties of auditory sequences dynamically influence expectation and SL and cognitive ability. To test whether results could also memory. Cognitive Science, 42, 43–76. http://dx.doi.org/10.1111/cogs be explained by differences in cognitive function, we collected a .12477 battery of cognitive assessments from both age groups. Across Aizenstein, H. J., Butters, M. A., Clark, K. A., Figurski, J. L., Andrew younger and older adults, we found evidence that higher cognitive Stenger, V., Nebes, R. D., . . . Carter, C. S. (2006). Prefrontal and striatal activation in elderly subjects during concurrent implicit and explicit assessment scores predict steeper learning trajectories. Impor- sequence learning. Neurobiology of Aging, 27, 741–751. http://dx.doi tantly, this effect was exacerbated in older adults. Specifically, .org/10.1016/j.neurobiolaging.2005.03.017 whereas older adults with high cognitive assessment scores show Barascud, N., Pearce, M. T., Griffiths, T. D., Friston, K. J., & Chait, M. similar SL performance compared to young adults with high (2016). Brain responses in humans reveal ideal observer-like sensitivity cognitive assessment scores, older adults with low cognitive as- to complex acoustic patterns. Proceedings of the National Academy of sessment scores show lower SL performance compared to younger Sciences of the United States of America, 113(5), E616 –E625. http://dx adults with matched scores. A possible explanation could be that .doi.org/10.1073/pnas.1508523113 low cognitive assessment scores in older adults may be indicative Barberis, N. C. (2013). Thirty years of prospect theory in economics: A of age-related cognitive decline that affects various functions in review and assessment. The Journal of Economic Perspectives, 27, the brain, whereas low scores in younger adults are less likely to 173–196. http://dx.doi.org/10.1257/jep.27.1.173 Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects be indicative of functional impairments. The multimodal paradigm structure for confirmatory hypothesis testing: Keep it maximal. Journal may have reduced the cognitive demand of the task, but it also of Memory and Language, 68, 255–278. http://dx.doi.org/10.1016/j.jml required visual-audio coordination, which in turn might pose a .2012.11.001 challenge, as perceptual processes decline with age. If this is the Bürkner, P. (2017). Brms: An r package for bayesian multilevel models case, then this could also be a contributing factor explaining why using stan. Journal of Statistical Software, 80, 1–28. http://dx.doi.org/ the cognitive assessment scores were stronger predictors of SL in 10.18637/jss.v080.i01 older adults. Bürkner, P. (2018). Advanced bayesian multilevel modeling with the r Of the large number of cognitive tests deployed, the two most package brms. arXiv, 10(1), 395– 411. http://dx.doi.org/10.32614/RJ- promising predictors of SL were the RAVLT 1 as well as the Digit 2018-017 Symbol (written) Modality test. This makes intuitive sense, as the Carstensen, L. L. (1992). Social and emotional patterns in adulthood: Support for socioemotional selectivity theory. Psychology and Aging, 7, Digit Symbol Modality test was designed to capture associative 331–338. http://dx.doi.org/10.1037/0882-7974.7.3.331 learning, and the RAVLT tests auditory memory. Clustering based Carstensen, L. L. (1995). Evidence for a life-span theory of socioemotional on the correlation magnitudes and overall low correlations (r ⫽ selectivity. Current Directions in Psychological Science, 4, 151–156. .33) further suggest that SL ability and traditional cognitive as- http://dx.doi.org/10.1111/1467-8721.ep11512261 sessments most likely target different underlying constructs and Carstensen, L. L., Fung, H. H., & Charles, S. T. (2003). Socioemotional any predictive information observed may be due to task-specific selectivity theory and the regulation of emotion in the second half of life. STATISTICAL LEARNING AND INFORMATION WEIGHTS 1103 Motivation and Emotion, 27, 103–123. http://dx.doi.org/10.1023/A: negative chord sequences. Poster presented at Brain. Cognition. Emo- 1024569803230 tion. Music., University of Kent Canterbury, Canterbury, England. Cherry, K. E., & Stadler, M. A. (1995). Implicit learning of a nonverbal http://dx.doi.org/10.17605/OSF.IO/EQ9JU sequence in younger and older adults. Psychology and Aging, 10, 379 – Herff, S. A., Zhen, S., Yu, R., & Agres, K. R. (2019). Age-dependent 394. http://dx.doi.org/10.1037/0882-7974.10.3.379 statistical learning trajectories reveal differences in information weight- Creel, S. C., Newport, E. L., & Aslin, R. N. (2004). Distant melodies: ing. Psyarxiv. http://dx.doi.org/10.31234/osf.io/kuy6p Statistical learning of nonadjacent dependencies in tone sequences. Hinault, T., Lemaire, P., & Touron, D. (2017). Strategy combination during Journal of Experimental Psychology: Learning, Memory, and Cogni- execution of memory strategies in young and older adults. Memory, 25, tion, 30, 1119 –1130. http://dx.doi.org/10.1037/0278-7393.30.5.1119 619 – 625. http://dx.doi.org/10.1080/09658211.2016.1200626 Curran, T. (1997). Effects of aging on implicit sequence learning: Account- Howard, D. V., & Howard, J. H., Jr. (1989). Age differences in learning ing for sequence structure and explicit knowledge. Psychological Re- serial patterns: Direct versus indirect measures. Psychology and Aging, search, 60(1–2), 24 – 41. http://dx.doi.org/10.1007/BF00419678 4, 357–364. http://dx.doi.org/10.1037/0882-7974.4.3.357 Daltrozzo, J., & Conway, C. M. (2014). Neurocognitive mechanisms of Howard, D. V., & Howard, J. H., Jr. (1992). Adult age differences in the statistical-sequential learning: What do event-related potentials tell us? rate of learning serial patterns: Evidence from direct and indirect tests. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Frontiers in Human Neuroscience, 8, 437. http://dx.doi.org/10.3389/ Psychology and Aging, 7, 232–241. http://dx.doi.org/10.1037/0882-7974 This document is copyrighted by the American Psychological Association or one of its allied publishers. fnhum.2014.00437 .7.2.232 D’Elia, L., Satz, P., Uchiyama, C. L., & White, T. (1996). Color trails test: Howard, D. V., Howard, J. H., Jr., Japikse, K., DiYanni, C., Thompson, A., Ctt. Odessa, FL: Psychological Assessment Resources Odessa. & Somberg, R. (2004). Implicit sequence learning: Effects of level of Eppinger, B., & Kray, J. (2011). To choose or to avoid: Age differences in structure, adult age, and extended practice. Psychology and Aging, 19, learning from positive and negative feedback. Journal of Cognitive 79 –92. http://dx.doi.org/10.1037/0882-7974.19.1.79 Neuroscience, 23, 41–52. http://dx.doi.org/10.1162/jocn.2009.21364 Howard, J. H., Jr., & Howard, D. V. (1997). Age differences in implicit Feeney, J. J., Howard, J. H., Jr., & Howard, D. V. (2002). Implicit learning learning of higher order dependencies in serial patterns. Psychology and of higher order sequences in middle age. Psychology and Aging, 17, Aging, 12, 634 – 656. http://dx.doi.org/10.1037/0882-7974.12.4.634 351–355. http://dx.doi.org/10.1037/0882-7974.17.2.351 Kaufman, S. B., Deyoung, C. G., Gray, J. R., Jiménez, L., Brown, J., & Feldman, J., Kerr, B., & Streissguth, A. P. (1995). Correlational analyses Mackintosh, N. (2010). Implicit learning as an ability. Cognition, 116, of procedural and declarative learning performance. Intelligence, 20, 321–340. http://dx.doi.org/10.1016/j.cognition.2010.05.011 87–114. http://dx.doi.org/10.1016/0160-2896(95)90007-1 Kirkham, N. Z., Slemmer, J. A., & Johnson, S. P. (2002). Visual statistical Feng, L., Lim, W.-S., Chong, M.-S., Lee, T.-S., Gao, Q., Nyunt, M. S., . . . learning in infancy: Evidence for a domain general learning mechanism. Ng, T.-P. (2017). Depressive symptoms increase the risk of mild neu- Cognition, 83(2), B35–B42. http://dx.doi.org/10.1016/S0010- rocognitive disorders among elderly Chinese. The Journal of Nutrition, 0277(02)00004-5 Health & Aging, 21, 161–164. http://dx.doi.org/10.1007/s12603-016- Krogh, L., Vlach, H. A., & Johnson, S. P. (2013). Statistical learning across 0765-3 development: Flexible yet constrained. Frontiers in Psychology, 3, 598. Ferdinand, N. K., & Kray, J. (2013). Age-related changes in processing http://dx.doi.org/10.3389/fpsyg.2012.00598 positive and negative feedback: Is there a positivity effect for older Löckenhoff, C. E., & Carstensen, L. L. (2007). Aging, emotion, and adults? Biological Psychology, 94, 235–241. http://dx.doi.org/10.1016/j health-related decision strategies: Motivational manipulations can re- .biopsycho.2013.07.006 duce age differences. Psychology and Aging, 22, 134 –146. http://dx.doi Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010). Statistically optimal .org/10.1037/0882-7974.22.1.134 perception and learning: From behavior to neural representations. Trends Mata, R., von Helversen, B., & Rieskamp, J. (2010). Learning to choose: in Cognitive Sciences, 14, 119 –130. http://dx.doi.org/10.1016/j.tics .2010.01.003 Cognitive aging and strategy selection learning in decision making. Forstmann, B. U., Tittgemeyer, M., Wagenmakers, E. J., Derrfuss, J., Psychology and Aging, 25, 299 –309. http://dx.doi.org/10.1037/ Imperati, D., & Brown, S. (2011). The speed-accuracy tradeoff in the a0018923 elderly brain: A structural model-based approach. The Journal of, 31, Matsen, F. A., & Nowak, M. A. (2004). Win-stay, lose-shift in language 17242–17249. http://dx.doi.org/10.1523/JNEUROSCI.0309-11.2011 learning from peers. Proceedings of the National Academy of Sciences, Frensch, P. A., & Miner, C. S. (1994). Effects of presentation rate and USA of the United States of America, 101, 18053–18057. http://dx.doi individual differences in short-term memory capacity on an indirect .org/10.1073/pnas.0406608102 measure of serial learning. Memory & Cognition, 22, 95–110. http://dx Milne, A. J., & Herff, S. A. (2020). The perceptual relevance of balance, .doi.org/10.3758/BF03202765 evenness, and entropy in musical rhythms. Cognition, 203, 104233. Gaddy, M. A., & Ingram, R. E. (2014). A meta-analytic review of mood- http://dx.doi.org/10.1016/j.cognition.2020.104233 congruent implicit memory in depressed mood. Clinical Psychology Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2010). On-line Review, 34, 402– 416. http://dx.doi.org/10.1016/j.cpr.2014.06.001 individual differences in statistical learning predict language processing. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Frontiers in Psychology, 1, 31. http://dx.doi.org/10.3389/fpsyg.2010 Rubin, D. B. (2013). Bayesian data analysis. London, England: Chap- .00031 man and Hall/CRC. http://dx.doi.org/10.1201/b16018 Moldwin, T., Schwartz, O., & Sussman, E. S. (2017). Statistical learning of Greve, A., Cooper, E., Kaula, A., Anderson, M. C., & Henson, R. (2017). melodic patterns influences the brain’s response to wrong notes. Journal Does prediction error drive one-shot declarative learning? Journal of of Cognitive Neuroscience, 29, 2114 –2122. http://dx.doi.org/10.1162/ Memory and Language, 94, 149 –165. http://dx.doi.org/10.1016/j.jml jocn_a_01181 .2016.11.001 Nassar, M. R., Bruckner, R., Gold, J. I., Li, S. C., Heekeren, H. R., & Herff, S. A., Nur, A., Lee, J., Lee, T., & Agres, K. (2019, July). Statistical Eppinger, B. (2016). Age differences in learning emerge from an insuf- learning ability as a measure of cognitive function. Paper presented at ficient representation of uncertainty in older adults. Nature Communi- the 41st Annual Conference of the Cognitive Science Society, Montreal, cations, 7, 11609. http://dx.doi.org/10.1038/ncomms11609 Canada, 24 –27 July. http://dx.doi.org/10.31234/osf.io/u4ry6 Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that Herff, S. A., & Prince, J. B. (2020, May). Learning, mood, and music: outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature, 364, Depression, anxiety, and stress reflect processing biases in positive and 56 –58. http://dx.doi.org/10.1038/364056a0 1104 HERFF, ZHEN, YU, AND AGRES Palmer, S. D., Hutson, J., & Mattys, S. L. (2018). Statistical learning for Siegelman, N., Bogaerts, L., Christiansen, M. H., & Frost, R. (2017). speech segmentation: Age-related changes and underlying mechanisms. Towards a theory of individual differences in statistical learning. Phil- Psychology and Aging, 33, 1035–1044. http://dx.doi.org/10.1037/ osophical Transactions of the Royal Society B: Biological Sciences, 372, pag0000292 20160059. http://dx.doi.org/10.1098/rstb.2016.0059 Palmer, S. D., & Mattys, S. L. (2016). Speech segmentation by statistical Siegelman, N., Bogaerts, L., & Frost, R. (2017). Measuring individual learning is supported by domain-general processes within working mem- differences in statistical learning: Current pitfalls and possible solutions. ory. The Quarterly Journal of Experimental Psychology, 69, 2390 – Behavior Research Methods, 49, 418 – 432. http://dx.doi.org/10.3758/ 2401. http://dx.doi.org/10.1080/17470218.2015.1112825 s13428-016-0719-z Pérez-González, D., & Malmierca, M. S. (2014). Adaptation in the auditory Siegelman, N., & Frost, R. (2015). Statistical learning as an individual system: An overview. Frontiers in Integrative Neuroscience, 8, 19. ability: Theoretical perspectives and empirical evidence. Journal of http://dx.doi.org/10.3389/fnint.2014.00019 Memory and Language, 81, 105–120. http://dx.doi.org/10.1016/j.jml Randolph, C., Braun, A. R., Goldberg, T. E., & Chase, T. N. (1993). .2015.02.001 Semantic fluency in Alzheimer’s, Parkinson’s, and Huntington’s dis- Smith, A. (1982). Symbol digit modalities test. Los Angeles, CA: Western ease: Dissociation of storage and retrieval failures. Neuropsychology, 7, Psychological Services Los Angeles. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. 82– 88. http://dx.doi.org/10.1037/0894-4105.7.1.82 Sohoglu, E., & Chait, M. (2016). Detecting and representing predictable Rescorla, R., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: This document is copyrighted by the American Psychological Association or one of its allied publishers. structure during auditory scene analysis. eLife, 5, e19113. http://dx.doi Variations in the effectiveness of reinforcement and nonreinforcement. .org/10.7554/eLife.19113 Classical conditioning II: Current research and theory, 2, 64 –99. Tan, J., Tsakok, F. H. M., Ow, E. K., Lanskey, B., Lim, K. S. D., Goh, Rey, A. (1958). L’examenclinique en psychologie [the psychological ex- L. G., . . . Feng, L. (2018). Study protocol for a randomized controlled amination]. Paris: Presses Universitaires de France. trial of choral singing intervention to prevent cognitive decline in at-risk Roseberry, S., Richie, R., Hirsh-Pasek, K., Golinkoff, R. M., & Shipley, older adults living in the community. Frontiers in Aging Neuroscience, T. F. (2011). Babies catch a break: 7- to 9-month-olds track statistical 10, 195. http://dx.doi.org/10.3389/fnagi.2018.00195 probabilities in continuous dynamic events. Psychological Science, 22, Thorndike, E. L. (1898). Animal intelligence: An experimental study of the 1422–1424. http://dx.doi.org/10.1177/0956797611422074 associative processes in animals. The Psychological Review: Mono- Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by graph Supplements, 2(4), i–109. http://dx.doi.org/10.1037/10780-000 8-month-old infants. Science, 274, 1926 –1928. http://dx.doi.org/10 WHO. (2015). World report on ageing and health. Retrieved from https:// .1126/science.274.5294.1926 Saffran, J. R., & Kirkham, N. Z. (2018). Infant statistical learning. Annual www.who.int/ageing/events/world-report-2015-launch/en/ Review of Psychology, 69, 181–203. http://dx.doi.org/10.1146/annurev- WHO. (2017). Amendments to the staff regulations and staff rules. Re- psych-122216-011805 trieved from https://apps.who.int/gb/ebwha/pdf_files/EB141/B141_11- Salthouse, T. A. (1979). Adult age and the speed-accuracy trade-off. en.pdf Ergonomics, 22, 811– 821. http://dx.doi.org/10.1080/001401379 Worthy, D. A., Hawthorne, M. J., & Otto, A. R. (2013). Heterogeneity of 08924659 strategy use in the Iowa gambling task: A comparison of win-stay/lose- Salthouse, T. A., McGuthry, K. E., & Hambrick, D. Z. (1999). A frame- shift and reinforcement learning models. Psychonomic Bulletin & Re- work for analyzing and interpreting differential aging patterns: Appli- view, 20, 364 –371. http://dx.doi.org/10.3758/s13423-012-0324-9 cation to three measures of implicit learning. Aging, neuropsychology, Yu, C. H. (2018). Neuropsychological assessments training manual for and Cognition, 6, 1–18. http://dx.doi.org/10.1076/anec.6.1.1.789 assessors (T. Y. Qian & S. J. Ching, Eds.; Version 3.1, Approved by Schirda, B., Valentine, T. R., Aldao, A., & Prakash, R. S. (2016). Age- K. E. Heok, L. Feng Ed.). Singapore: Yong Loo Lin School of Medi- related differences in emotion regulation strategies: Examining the role cine’s Department of Psychological Medicine. of contextual factors. Developmental Psychology, 52, 1370 –1380. http:// dx.doi.org/10.1037/dev0000194 Shafir, S., Reich, T., Tsur, E., Erev, I., & Lotem, A. (2008). Perceptual Received January 24, 2020 accuracy and conflicting effects of certainty on risk-taking behaviour. Revision received June 29, 2020 Nature, 453, 917–920. http://dx.doi.org/10.1038/nature06841 Accepted July 5, 2020 䡲

References (68)

  1. Agres, K., Abdallah, S., & Pearce, M. (2018). Information-theoretic prop- erties of auditory sequences dynamically influence expectation and memory. Cognitive Science, 42, 43-76. http://dx.doi.org/10.1111/cogs .12477
  2. Aizenstein, H. J., Butters, M. A., Clark, K. A., Figurski, J. L., Andrew Stenger, V., Nebes, R. D., . . . Carter, C. S. (2006). Prefrontal and striatal activation in elderly subjects during concurrent implicit and explicit sequence learning. Neurobiology of Aging, 27, 741-751. http://dx.doi .org/10.1016/j.neurobiolaging.2005.03.017
  3. Barascud, N., Pearce, M. T., Griffiths, T. D., Friston, K. J., & Chait, M. (2016). Brain responses in humans reveal ideal observer-like sensitivity to complex acoustic patterns. Proceedings of the National Academy of Sciences of the United States of America, 113(5), E616 -E625. http://dx .doi.org/10.1073/pnas.1508523113
  4. Barberis, N. C. (2013). Thirty years of prospect theory in economics: A review and assessment. The Journal of Economic Perspectives, 27, 173-196. http://dx.doi.org/10.1257/jep.27.1.173
  5. Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255-278. http://dx.doi.org/10.1016/j.jml .2012.11.001
  6. Bürkner, P. (2017). Brms: An r package for bayesian multilevel models using stan. Journal of Statistical Software, 80, 1-28. http://dx.doi.org/ 10.18637/jss.v080.i01
  7. Bürkner, P. (2018). Advanced bayesian multilevel modeling with the r package brms. arXiv, 10(1), 395-411. http://dx.doi.org/10.32614/RJ- 2018-017
  8. Carstensen, L. L. (1992). Social and emotional patterns in adulthood: Support for socioemotional selectivity theory. Psychology and Aging, 7, 331-338. http://dx.doi.org/10.1037/0882-7974.7.3.331
  9. Carstensen, L. L. (1995). Evidence for a life-span theory of socioemotional selectivity. Current Directions in Psychological Science, 4, 151-156. http://dx.doi.org/10.1111/1467-8721.ep11512261
  10. Carstensen, L. L., Fung, H. H., & Charles, S. T. (2003). Socioemotional selectivity theory and the regulation of emotion in the second half of life. This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly. Motivation and Emotion, 27, 103-123. http://dx.doi.org/10.1023/A: 1024569803230
  11. Cherry, K. E., & Stadler, M. A. (1995). Implicit learning of a nonverbal sequence in younger and older adults. Psychology and Aging, 10, 379 - 394. http://dx.doi.org/10.1037/0882-7974.10.3.379
  12. Creel, S. C., Newport, E. L., & Aslin, R. N. (2004). Distant melodies: Statistical learning of nonadjacent dependencies in tone sequences. Journal of Experimental Psychology: Learning, Memory, and Cogni- tion, 30, 1119 -1130. http://dx.doi.org/10.1037/0278-7393.30.5.1119
  13. Curran, T. (1997). Effects of aging on implicit sequence learning: Account- ing for sequence structure and explicit knowledge. Psychological Re- search, 60(1-2), 24 -41. http://dx.doi.org/10.1007/BF00419678
  14. Daltrozzo, J., & Conway, C. M. (2014). Neurocognitive mechanisms of statistical-sequential learning: What do event-related potentials tell us? Frontiers in Human Neuroscience, 8, 437. http://dx.doi.org/10.3389/ fnhum.2014.00437
  15. D'Elia, L., Satz, P., Uchiyama, C. L., & White, T. (1996). Color trails test: Ctt. Odessa, FL: Psychological Assessment Resources Odessa.
  16. Eppinger, B., & Kray, J. (2011). To choose or to avoid: Age differences in learning from positive and negative feedback. Journal of Cognitive Neuroscience, 23, 41-52. http://dx.doi.org/10.1162/jocn.2009.21364
  17. Feeney, J. J., Howard, J. H., Jr., & Howard, D. V. (2002). Implicit learning of higher order sequences in middle age. Psychology and Aging, 17, 351-355. http://dx.doi.org/10.1037/0882-7974.17.2.351
  18. Feldman, J., Kerr, B., & Streissguth, A. P. (1995). Correlational analyses of procedural and declarative learning performance. Intelligence, 20, 87-114. http://dx.doi.org/10.1016/0160-2896(95)90007-1
  19. Feng, L., Lim, W.-S., Chong, M.-S., Lee, T.-S., Gao, Q., Nyunt, M. S., . . . Ng, T.-P. (2017). Depressive symptoms increase the risk of mild neu- rocognitive disorders among elderly Chinese. The Journal of Nutrition, Health & Aging, 21, 161-164. http://dx.doi.org/10.1007/s12603-016- 0765-3
  20. Ferdinand, N. K., & Kray, J. (2013). Age-related changes in processing positive and negative feedback: Is there a positivity effect for older adults? Biological Psychology, 94, 235-241. http://dx.doi.org/10.1016/j .biopsycho.2013.07.006
  21. Fiser, J., Berkes, P., Orbán, G., & Lengyel, M. (2010). Statistically optimal perception and learning: From behavior to neural representations. Trends in Cognitive Sciences, 14, 119 -130. http://dx.doi.org/10.1016/j.tics .2010.01.003
  22. Forstmann, B. U., Tittgemeyer, M., Wagenmakers, E. J., Derrfuss, J., Imperati, D., & Brown, S. (2011). The speed-accuracy tradeoff in the elderly brain: A structural model-based approach. The Journal of, 31, 17242-17249. http://dx.doi.org/10.1523/JNEUROSCI.0309-11.2011
  23. Frensch, P. A., & Miner, C. S. (1994). Effects of presentation rate and individual differences in short-term memory capacity on an indirect measure of serial learning. Memory & Cognition, 22, 95-110. http://dx .doi.org/10.3758/BF03202765
  24. Gaddy, M. A., & Ingram, R. E. (2014). A meta-analytic review of mood- congruent implicit memory in depressed mood. Clinical Psychology Review, 34, 402-416. http://dx.doi.org/10.1016/j.cpr.2014.06.001
  25. Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. London, England: Chap- man and Hall/CRC. http://dx.doi.org/10.1201/b16018
  26. Greve, A., Cooper, E., Kaula, A., Anderson, M. C., & Henson, R. (2017). Does prediction error drive one-shot declarative learning? Journal of Memory and Language, 94, 149 -165. http://dx.doi.org/10.1016/j.jml .2016.11.001
  27. Herff, S. A., Nur, A., Lee, J., Lee, T., & Agres, K. (2019, July). Statistical learning ability as a measure of cognitive function. Paper presented at the 41st Annual Conference of the Cognitive Science Society, Montreal, Canada, 24 -27 July. http://dx.doi.org/10.31234/osf.io/u4ry6
  28. Herff, S. A., & Prince, J. B. (2020, May). Learning, mood, and music: Depression, anxiety, and stress reflect processing biases in positive and negative chord sequences. Poster presented at Brain. Cognition. Emo- tion. Music., University of Kent Canterbury, Canterbury, England. http://dx.doi.org/10.17605/OSF.IO/EQ9JU
  29. Herff, S. A., Zhen, S., Yu, R., & Agres, K. R. (2019). Age-dependent statistical learning trajectories reveal differences in information weight- ing. Psyarxiv. http://dx.doi.org/10.31234/osf.io/kuy6p
  30. Hinault, T., Lemaire, P., & Touron, D. (2017). Strategy combination during execution of memory strategies in young and older adults. Memory, 25, 619 -625. http://dx.doi.org/10.1080/09658211.2016.1200626
  31. Howard, D. V., & Howard, J. H., Jr. (1989). Age differences in learning serial patterns: Direct versus indirect measures. Psychology and Aging, 4, 357-364. http://dx.doi.org/10.1037/0882-7974.4.3.357
  32. Howard, D. V., & Howard, J. H., Jr. (1992). Adult age differences in the rate of learning serial patterns: Evidence from direct and indirect tests. Psychology and Aging, 7, 232-241. http://dx.doi.org/10.1037/0882-7974 .7.2.232
  33. Howard, D. V., Howard, J. H., Jr., Japikse, K., DiYanni, C., Thompson, A., & Somberg, R. (2004). Implicit sequence learning: Effects of level of structure, adult age, and extended practice. Psychology and Aging, 19, 79 -92. http://dx.doi.org/10.1037/0882-7974.19.1.79
  34. Howard, J. H., Jr., & Howard, D. V. (1997). Age differences in implicit learning of higher order dependencies in serial patterns. Psychology and Aging, 12, 634 -656. http://dx.doi.org/10.1037/0882-7974.12.4.634
  35. Kaufman, S. B., Deyoung, C. G., Gray, J. R., Jiménez, L., Brown, J., & Mackintosh, N. (2010). Implicit learning as an ability. Cognition, 116, 321-340. http://dx.doi.org/10.1016/j.cognition.2010.05.011
  36. Kirkham, N. Z., Slemmer, J. A., & Johnson, S. P. (2002). Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition, 83(2), B35-B42. http://dx.doi.org/10.1016/S0010- 0277(02)00004-5
  37. Krogh, L., Vlach, H. A., & Johnson, S. P. (2013). Statistical learning across development: Flexible yet constrained. Frontiers in Psychology, 3, 598. http://dx.doi.org/10.3389/fpsyg.2012.00598
  38. Löckenhoff, C. E., & Carstensen, L. L. (2007). Aging, emotion, and health-related decision strategies: Motivational manipulations can re- duce age differences. Psychology and Aging, 22, 134 -146. http://dx.doi .org/10.1037/0882-7974.22.1.134
  39. Mata, R., von Helversen, B., & Rieskamp, J. (2010). Learning to choose: Cognitive aging and strategy selection learning in decision making. Psychology and Aging, 25, 299 -309. http://dx.doi.org/10.1037/ a0018923
  40. Matsen, F. A., & Nowak, M. A. (2004). Win-stay, lose-shift in language learning from peers. Proceedings of the National Academy of Sciences, USA of the United States of America, 101, 18053-18057. http://dx.doi .org/10.1073/pnas.0406608102
  41. Milne, A. J., & Herff, S. A. (2020). The perceptual relevance of balance, evenness, and entropy in musical rhythms. Cognition, 203, 104233. http://dx.doi.org/10.1016/j.cognition.2020.104233
  42. Misyak, J. B., Christiansen, M. H., & Tomblin, J. B. (2010). On-line individual differences in statistical learning predict language processing. Frontiers in Psychology, 1, 31. http://dx.doi.org/10.3389/fpsyg.2010 .00031
  43. Moldwin, T., Schwartz, O., & Sussman, E. S. (2017). Statistical learning of melodic patterns influences the brain's response to wrong notes. Journal of Cognitive Neuroscience, 29, 2114 -2122. http://dx.doi.org/10.1162/ jocn_a_01181
  44. Nassar, M. R., Bruckner, R., Gold, J. I., Li, S. C., Heekeren, H. R., & Eppinger, B. (2016). Age differences in learning emerge from an insuf- ficient representation of uncertainty in older adults. Nature Communi- cations, 7, 11609. http://dx.doi.org/10.1038/ncomms11609
  45. Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game. Nature, 364, 56 -58. http://dx.doi.org/10.1038/364056a0
  46. This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
  47. Palmer, S. D., Hutson, J., & Mattys, S. L. (2018). Statistical learning for speech segmentation: Age-related changes and underlying mechanisms. Psychology and Aging, 33, 1035-1044. http://dx.doi.org/10.1037/ pag0000292
  48. Palmer, S. D., & Mattys, S. L. (2016). Speech segmentation by statistical learning is supported by domain-general processes within working mem- ory. The Quarterly Journal of Experimental Psychology, 69, 2390 - 2401. http://dx.doi.org/10.1080/17470218.2015.1112825
  49. Pérez-González, D., & Malmierca, M. S. (2014). Adaptation in the auditory system: An overview. Frontiers in Integrative Neuroscience, 8, 19. http://dx.doi.org/10.3389/fnint.2014.00019
  50. Randolph, C., Braun, A. R., Goldberg, T. E., & Chase, T. N. (1993). Semantic fluency in Alzheimer's, Parkinson's, and Huntington's dis- ease: Dissociation of storage and retrieval failures. Neuropsychology, 7, 82-88. http://dx.doi.org/10.1037/0894-4105.7.1.82
  51. Rescorla, R., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. Classical conditioning II: Current research and theory, 2, 64 -99.
  52. Rey, A. (1958). L'examenclinique en psychologie [the psychological ex- amination]. Paris: Presses Universitaires de France.
  53. Roseberry, S., Richie, R., Hirsh-Pasek, K., Golinkoff, R. M., & Shipley, T. F. (2011). Babies catch a break: 7-to 9-month-olds track statistical probabilities in continuous dynamic events. Psychological Science, 22, 1422-1424. http://dx.doi.org/10.1177/0956797611422074
  54. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926 -1928. http://dx.doi.org/10 .1126/science.274.5294.1926
  55. Saffran, J. R., & Kirkham, N. Z. (2018). Infant statistical learning. Annual Review of Psychology, 69, 181-203. http://dx.doi.org/10.1146/annurev- psych-122216-011805
  56. Salthouse, T. A. (1979). Adult age and the speed-accuracy trade-off. Ergonomics, 22, 811-821. http://dx.doi.org/10.1080/001401379
  57. Salthouse, T. A., McGuthry, K. E., & Hambrick, D. Z. (1999). A frame- work for analyzing and interpreting differential aging patterns: Appli- cation to three measures of implicit learning. Aging, neuropsychology, and Cognition, 6, 1-18. http://dx.doi.org/10.1076/anec.6.1.1.789
  58. Schirda, B., Valentine, T. R., Aldao, A., & Prakash, R. S. (2016). Age- related differences in emotion regulation strategies: Examining the role of contextual factors. Developmental Psychology, 52, 1370 -1380. http:// dx.doi.org/10.1037/dev0000194
  59. Shafir, S., Reich, T., Tsur, E., Erev, I., & Lotem, A. (2008). Perceptual accuracy and conflicting effects of certainty on risk-taking behaviour. Nature, 453, 917-920. http://dx.doi.org/10.1038/nature06841
  60. Siegelman, N., Bogaerts, L., Christiansen, M. H., & Frost, R. (2017). Towards a theory of individual differences in statistical learning. Phil- osophical Transactions of the Royal Society B: Biological Sciences, 372, 20160059. http://dx.doi.org/10.1098/rstb.2016.0059
  61. Siegelman, N., Bogaerts, L., & Frost, R. (2017). Measuring individual differences in statistical learning: Current pitfalls and possible solutions. Behavior Research Methods, 49, 418 -432. http://dx.doi.org/10.3758/ s13428-016-0719-z
  62. Siegelman, N., & Frost, R. (2015). Statistical learning as an individual ability: Theoretical perspectives and empirical evidence. Journal of Memory and Language, 81, 105-120. http://dx.doi.org/10.1016/j.jml .2015.02.001
  63. Smith, A. (1982). Symbol digit modalities test. Los Angeles, CA: Western Psychological Services Los Angeles.
  64. Sohoglu, E., & Chait, M. (2016). Detecting and representing predictable structure during auditory scene analysis. eLife, 5, e19113. http://dx.doi .org/10.7554/eLife.19113
  65. Tan, J., Tsakok, F. H. M., Ow, E. K., Lanskey, B., Lim, K. S. D., Goh, L. G., . . . Feng, L. (2018). Study protocol for a randomized controlled trial of choral singing intervention to prevent cognitive decline in at-risk older adults living in the community. Frontiers in Aging Neuroscience, 10, 195. http://dx.doi.org/10.3389/fnagi.2018.00195
  66. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Mono- graph Supplements, 2(4), i-109. http://dx.doi.org/10.1037/10780-000
  67. WHO. (2015). World report on ageing and health. Retrieved from https:// www.who.int/ageing/events/world-report-2015-launch/en/ WHO. (2017). Amendments to the staff regulations and staff rules. Re- trieved from https://apps.who.int/gb/ebwha/pdf_files/EB141/B141_11- en.pdf Worthy, D. A., Hawthorne, M. J., & Otto, A. R. (2013). Heterogeneity of strategy use in the Iowa gambling task: A comparison of win-stay/lose- shift and reinforcement learning models. Psychonomic Bulletin & Re- view, 20, 364 -371. http://dx.doi.org/10.3758/s13423-012-0324-9
  68. Yu, C. H. (2018). Neuropsychological assessments training manual for assessors (T. Y. Qian & S. J. Ching, Eds.; Version 3.1, Approved by K. E. Heok, L. Feng Ed.). Singapore: Yong Loo Lin School of Medi- cine's Department of Psychological Medicine.
National University of Singapore, Faculty Member

I'm an Assistant Professor at the Yong Siew Toh Conservatory of Music at the National University of Singapore (NUS), and a Research Scientist and founder of the Music Cognition group at the Institute of High Performance Computing, Social & Cognitive Computing Department, at the Agency for Science, Technology and Research (A*STAR). My research focuses on the cognitive science of music perception and cognition, using computational models to simulate cognition, computational creativity, music & medicine, and developing music technology (music medtech) for healthcare and well-being. In addition to my scientific research pursuits, I have a degree in cello performance, and love playing music! For more information about my work, please see my website at www.katagres.com

Papers
27
Followers
247
View all papers from Kathleen R. Agresarrow_forward