ORIGINAL RESEARCH published: 11 February 2019 doi: 10.3389/fnhum.2019.00035 Proactive Information Sampling in Value-Based Decision-Making: Deciding When and Where to Saccade Mingyu Song 1,2† , Xingyu Wang 1,3† , Hang Zhang 1,4,5* and Jian Li 1* 1 School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China, 2 Princeton Neuroscience Institute, Princeton University, Princeton, NJ, United States, 3 Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL, United States, 4 PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing, China, 5 Peking-Tsinghua Center for Life Sciences, Beijing, China Evidence accumulation has been the core component in recent development of perceptual and value-based decision-making theories. Most studies have focused on the evaluation of evidence between alternative options. What remains largely unknown is the process that prepares evidence: how may the decision-maker sample different sources of information sequentially, if they can only sample one source at a time? Here we propose a theoretical framework in prescribing how different sources of information Edited by: should be sampled to facilitate the decision process: beliefs for different noisy sources Xing Tian, are updated in a Bayesian manner and participants can proactively allocate resource for New York University Shanghai, China sampling (i.e., saccades) among different sources to maximize the information gain in Reviewed by: Krishna P. Miyapuram, such process. We show that our framework can account for human participants’ actual Indian Institute of Technology choice and saccade behavior in a two-alternative value-based decision-making task. Gandhinagar, India Qi Chen, Moreover, our framework makes novel predictions about the empirical eye movement South China Normal University, China patterns. *Correspondence: Keywords: decision-making, eye-tracking, information sampling, Bayesian inference, drift-diffusion model Hang Zhang

[email protected]

Jian Li

[email protected]

INTRODUCTION † These authors have contributed Value-based binary choice is a common and fundamental form of human decision making, from equally to this work choosing between ham and turkey sandwiches for lunch to determining whether to have a family with a particular individual. During these decisions, the process of evaluating the options and Received: 03 November 2018 comparing them is often complex: even in problems as simple as deciding on which sandwich to Accepted: 22 January 2019 take, people usually need to gaze at different options sequentially for multiple times before arriving Published: 11 February 2019 at a decision. Citation: Classic theories about evaluation often neglect gazing and fixation, merely focusing on how Song M, Wang X, Zhang H and Li J values of individual items are assigned (Kahneman and Tversky, 1979; Levy and Glimcher, 2012; (2019) Proactive Information Sampling Ruff and Fehr, 2014). Recent studies have started to pay attention to the important role of in Value-Based Decision-Making: Deciding When and Where to fixation in binary and multiple choice scenarios and have typically viewed fixation as an evidence Saccade. accumulation process (Krajbich et al., 2010; Krajbich and Rangel, 2011; Cassey et al., 2013; Towal Front. Hum. Neurosci. 13:35. et al., 2013; Tavares et al., 2017). Recent primate neurophysiology and human neuroimaging doi: 10.3389/fnhum.2019.00035 research has placed this accumulation process at the core for perceptual decision making. It has Frontiers in Human Neuroscience | www.frontiersin.org 1 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making been hypothesized that noisy evidence for each decision also been proposed in the field of visual search and in perceptual accumulates until certain threshold is reached and the decision tasks (Najemnik and Geisler, 2005; Cassey et al., 2013; corresponding decision is made (Ratcliff, 1978; Shadlen Ahmad et al., 2014). et al., 1996; Platt and Glimcher, 1999; Gold and Shadlen, 2002; Similar to aDDM (Krajbich et al., 2010) and the value-plus- Bogacz et al., 2006; Summerfield and Tsetsos, 2012; McGinty salience model (Towal et al., 2013), our model well predicts et al., 2016). Such a computational approach has also been participants’ choice behaviors: for example, the decisions bias adopted to study the process of value-based decisions (Krajbich toward the last fixated item and the item fixated longer. et al., 2010; Krajbich and Rangel, 2011; De Martino et al., 2012). Furthermore, our model predicts the distribution of fixation In one such study (Krajbich et al., 2010), human participants durations. It does so from a Bayesian perspective and can explain were asked to choose between two snack items on a computer fixation patterns that previous stochastic accumulation models screen. Participants could look at both items freely before making such as aDDM were agnostic about: for instance, the fixation the choice and their eye movement data were simultaneously duration is shorter in trials with greater absolute rating difference recorded. In most trials, participants’ fixation switched back between items and for later fixations within a trial. Most and forth between the two items for a few times before the final importantly, our model views the saccade switching phenomena choice was made. By assuming that the fixated and non-fixated as an active process to maximize information gain in order to items were sampled asymmetrically and adopting an attentional reach a decision more efficiently. Our approach thus provides drift-diffusion model (aDDM), Krajbich et al. successfully a unified framework in describing how different sources of predicted participants’ choices based on the observed eye information are sampled proactively to facilitate the decision tracking data. As in other previous studies, they concentrated process. on how evidence is integrated to reach the decision threshold given the fixation pattern shown by the participants, and aDDM MATERIALS AND METHODS is just one form of the stochastic accumulation models that also include sequential probability ratio test (Gold and Shadlen, 2002, Task 2007), race and leaky competing accumulator models (Usher The experimental design and data collection were reported in and McClelland, 2001), among others (Bogacz et al., 2006). In detail in Krajbich et al. (2010). In brief, 39 Caltech students most of previous studies, fixation data were taken as given and participated in the experiment and they were asked to refrain experimentally measured saccade data (via eye-tracking) were from eating 3 h before the task. The experiment consisted of a fed into the models to predict choice behavior (Krajbich et al., rating phase and a choice phase. 2010; Krajbich and Rangel, 2011; but see Towal et al., 2013). Here In the rating phase, participants were asked to rate 70 different we focus instead on the sampling assumption itself: What drives food items using an on-screen slider bar (“how much would you the switching of fixation between options in a two-alternative like to eat this at the end of the experiment?”), on a scale of value-based choice task before the choice is made? −10 to 10. Any item receiving a rating lower than 0 would not In the current study, we propose a Bayesian proactive show up in the following choice phase so that all choice items are sampling framework to account for both the choice behavior motivationally relevant to the participant. and saccade patterns in the same experiment run by Krajbich In each trial of the choice phase, participants were asked to et al. (2010). We assume that instead of a single quantity, choose from a pair of food items (selected from the 70 items they item attractiveness is internally represented as a probability rated earlier) by pressing the left or right key on the keyboard distribution along the value dimension, and the fixation duration (Figure 1A) while their eye movements were simultaneously reflects the number of samples gleaned from such underlying recorded by the eye-tracker. The spatial locations of these snack distribution to form a belief distribution (Cassey et al., 2013). In items were randomized across trials. There was no time limit for this way, we formulate the evaluation process as Bayesian belief response. In the end, participants were paid $20 show-up fee in updating based on samples from different information sources addition to the snack item they picked in a random trial of the rather than simple evidence accumulation (Cassey et al., 2013). choice phase. For details on the choice phase we refer the readers More importantly, inspired by the Informax algorithm (Butko to the original paper (Krajbich et al., 2010). and Movellan, 2010), we assume that participants proactively switch their fixation from one item to the other when the Model marginal information gain of continuing the current fixation We propose a sampling-and-inference based model (Figure 1B) becomes lower than that of switching. For instance, fixating to predict both the choice and eye movement patterns leading to at one item (and gathering information/samples from it) for the decision. Instead of viewing gaze switches between options too long might not be beneficial, since the participant would merely as an evidence accumulation process, we reason that this have been very confident about how attractive the fixated item process is carried out to maximize the informational gain to is but still uncertain about its alternative, rendering inability differentiate between two estimated value distributions. In this to choose between the two items. Thus, to make efficient section, we first briefly lay out the structure of the model, and decisions, participants need to balance between getting a more then describe the assumptions and predictions in detail. accurate estimation on the currently fixated item by continuously On each trial, we assume that the participant goes through sampling and potentially more information gain by switching a few gaze switch cycles, each of which contains information- fixation to the other item. Similar ideas of active sampling have collection and decision-making steps. The participant’s goal is to Frontiers in Human Neuroscience | www.frontiersin.org 2 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making FIGURE 1 | Experiment design and diagram of the model. (A) experiment design. In each choice trial, participants were presented with images of two food items and asked to make their choices. After the choice was made, a yellow box appeared around the chosen item for 1 s. See Figure 1 of Krajbich et al. (2010). (B) the upper panel illustrates the fixation sequence. Each fixation consists of at least one sample. The lower panel shows the four stages of the model: sampling, belief updating, decision and fixation switch. The yellow and brown curves correspond to the two items. make the correct choice (i.e., the item with greater attractiveness) switch is a natural means to maximize the information gain. as quickly as possible. Due to the span of attention, information More concretely, we hypothesize that during the information- collection is inevitably biased toward the fixated item and gaze collection cycle, the participant chronically (a) samples noisy Frontiers in Human Neuroscience | www.frontiersin.org 3 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making evidence from the two items, and (b) updates their internal likelihoods of samples (xf ,t and xn,t ) and prior beliefs to form the beliefs about the values of the two items accordingly. During posterior beliefs according to the Bayes’ rule: the decision-making step, the participant (c) judges whether the P v̂i = vxi,1 : t , σi2 ∝ P xi,t v̂i = v, σi2 P v̂i = vxi,1 : t−1 , σi2 information collected is enough to warrant a decision, i.e., the decision variable surpassing a threshold; and if so, a decision is (3) made. Otherwise, the participant (d) chooses which item to fixate on next contingent on the relative information gain between the where v̂i (i = f or n) is the value estimate. Since both the prior and two items. likelihood are assumed to be Gaussian, the participant’s posterior beliefs are also Gaussian (Lee, 2012) (denoted by N (µi,t , σi,t2 )), (a) Sampling (With Bias) with their means updated according to In the beginning of each trial, the participant randomly decides 2 σi,t−1 xi,t + σi2 µi,t−1 which item to look at [with 74% probability of looking at the left µi,t = 2 (4) item first and 26% of the right, based on the empirical fixation σi,t−1 + σi2 probability (Krajbich et al., 2010)]. At any specific moment, the two items are referred to as the fixated item (denoted by f ) after the tth samples. and the non-fixated item (denoted by n). We assume that the The variance of the posterior belief about the fixated item is participant has no direct access to the true value of either item (vf and vn ) but can only obtain random samples from a Gaussian σf2,t−1 σf2 distribution centered around the true value (t denotes the t-th σf2,t = (5) σf2,t−1 + σf2 sample): For the non-fixated item, we assume a variance expansion effect xf ,t ∼ N vf , σf2 (1) (Bogacz et al., 2007; Bornstein et al., 2017). In particular, we hypothesize that the participant becomes more uncertain about xn,t ∼ N γ vn , σn2 (2) the non-fixated item while they are fixating the other item so that the variance of the posterior belief about the non-fixated item is For the fixated item (see Equation 1), the mean of the sampling the same as Equation 5 except that an extra expanding factor λ distribution is set to be the participant’s rating of that item in (>1) is introduced: the rating phase, under the assumption that their rating upon 2 σn2 contemplation for each item reflects an accurate and unbiased 2 σn,t−1 estimation of the true item value. The variance of sampling σn,t =λ 2 (6) σn,t−1 + σn2 distribution is denoted by σf2 (σf2 = σ02 , with σ0 being a free parameter of the model). Similar to Krajbich et al. (2010), we (c) Judging Whether Information Is Enough assume that the non-fixated item is perceived with distortion. for a Decision For simplicity, we scale the mean and variance of its sampling Similar to the aDDM model in previous research (Krajbich et al., distribution by factors γ (0 ≤ γ ≤ 1) and κ (σn2 = κσ02 ; κ ≥ 1) 2010; Tavares et al., 2017), we use the relative decision value respectively to reflect the discounted and noisier representation (RDV = v̂f − v̂n ) as the decision variable. At the beginning for the non-fixated item. of each trial, the RDV starts at 0 and with the belief updating Sampling takes time and we simply assume the sampling time after each sample, the participant continuously evaluates the follows a uniform distribution between 50 and 150 ms, based on probability of making a correct choice, according to their value the empirical observations in object recognition (Kirchner and estimates for the two items. That is, P(v̂f − v̂n > 0) if Thorpe, 2006) and visual working memory studies (Gegenfurtner v̂f ,t > v̂n,t and P(v̂n − v̂f > 0) otherwise. If this probability and Sperling, 1993) that it takes about 100 ms for visual exceeds a threshold θt , the participant will pick the item information to be extracted or transferred from iconic memory with the higher estimated value, and the sampling-and-decision to visual working memory. procedure terminates. Otherwise, the participant continues to The samples xf ,t and xn,t will then be used to update the collect more information until such a fair comparison is participant’s belief of the values of corresponding items. warranted. We assume that the threshold θt decreases after each belief update, in order to avoid arbitrarily long arbitration (b) Updating (Tajima et al., 2016). For simplicity and without loss of generality, We formulate the belief updating procedure according to the we use a linear function: θt = 1 − δt, where again t denotes the Bayes’ rule. First, starting from the internal representation of item number of samples or updates. values, we assume that the participant has a broad prior over the values of two items at the beginning of each trial, centered around (d) Choosing Which Item to Fixate On zero with a variance of σi,0 2 = σ 2 , where i = f or n (fixated or Here we assume that if the threshold for choice decision has 0 non-fixated). not been reached, the participant decides whether to switch With the samples obtained from both items, the participant fixation in such a way as to separate two value distributions most updates their beliefs about item values by combining the efficiently. Inspired by the optimal sampling theory in perceptual Frontiers in Human Neuroscience | www.frontiersin.org 4 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making decision making that the sampling time allocated to different TABLE 1 | Model parameters and their range tested in the simulation. information sources should be proportional to their noise levels Parameter Description Parameter range (Cassey et al., 2013), we assume the probability of switching tested in simulation to the non-fixated item is determined by a logistic function of the uncertainty (standard deviation) ratio between the posterior σ0 The standard deviation of the sampling [4 to 10] belief distributions: distributions. δ The decreasing step of decision threshold. [0.0025 to 0.02] γ The discounting factor on the mean of the [0 to 0.9] 1 PSwitch, t = σ (7) sampling distribution for the non-fixated −(ω σn,t +ω0 ) item. 1+e f ,t κ The factor by which the sampling [2 to 4] distribution of the non-fixated item is where ω (> 0) reflects the sensitivity to the uncertainty ratio, noisier than that of the fixated item. and ω0 reflects a bias on saccade (“repositioning”) tendency λ The expanding factor of belief uncertainty [1 to 1.25] of the non-fixated item. respectively. Note that the fixated item becomes non-fixated and ω, ω0 The slope (sensitivity to uncertainty ratio) ω: [1 to 4] vice versa (corresponding to a swap of subscripts f and n in the and intercept (saccade cost) of the ω0 : [−8 to −2] equations) once the saccade switch occurs. parameters in the softmax decision This saccade policy arises from our assumption that prior to function. a final decision, the participant actively samples from the two items so that they can reach a decision efficiently. Intuitively, if the non-fixated item bears a much higher uncertainty relative to the currently fixated one, the participant should switch fixation RESULTS to the non-fixated item to gain more information. The proactive sampling continues until the decision threshold has been reached Choice Patterns and Fixation Biases We first show that our model accounts for the core model and an explicit decision ensues. predictions in the original paper (Krajbich et al., 2010). In the decision phase, participants’ choices were consistent with how Comparison With aDDM they rated the items in the rating phase: the higher they rated As pointed out in previous literatures, Bayesian update with an item compared to its alternative, the more likely the item sampling from Gaussian distributions (assuming equivalent would be chosen (mixed effect logistic regression slope β = 0.60, sampling variance for the fixated and non-fixated items and no p < 0.001; bars in Figure 2A). In our model, the value estimate expansion of variance for the non-fixated item) is essentially for each item is obtained from sampling the underlying option equivalent to the combination of evidence-accumulation and value distribution, and this is reflected in the final choice (line Wiener process in aDDM (Bitzer et al., 2014). The main in Figure 2A), consistent with previous literature that suggested difference between our work and previous studies, however, choice predictions from aDDM can be incorporated in the is that we proposed a fixation-switch policy based on active Bayesian framework (Cassey et al., 2013). This is similar to the sampling theory, which predicts the patterns of both fixations and aDDM, where RDV was accumulated according to the (relative) the final choice, whereas Krajbich et al. (2010) used the empirical difference between the values of two items. Participants were fixation patterns to derive the choice pattern. more likely to choose an item if they spent longer time looking at it (β = 0.0017, p < 0.001; Figure 2B) and if they were looking at it right before the choice selection (β = 0.61, p < Simulation 0.001; Figure 2C). Both effects were predicted by aDDM via the Given the multiple dimensions of fixation data (the number of assumption that the drift rate of non-fixated item is discounted. fixations, the fixation duration, and other fixation patterns) and Similarly, our model can explain both phenomena because it choice behavior, it is therefore difficult to devise a single metric assumes value discounting for the non-fixated item (Equation 2): to perform model fitting. Instead, we perform model simulation a longer fixation indicates a stronger discounting effect on the under a particular set of parameter values to demonstrate that non-fixated item, resulting in a lower evaluation for the non- a fully Bayesian approach can capture a variety of aspects of fixated item and thus a higher probability of choosing the fixated participants’ data, especially fixation patterns which have been one. Similarly, provided everything else being equal, the item of largely overlooked in previous research. The parameters used in last fixation enjoys more unbiased (undiscounted) evaluations the simulation are σ 0 = 4, δ = 0.005, γ = 0.1, λ = 1.1, κ = 2, before the choice, hence the higher chosen rate. It was also ω = 2.5, and ω0 = −6.5 (see Table 1 for a summary description observed in the data that reaction time decreased as the absolute of model parameters). However, it is worth noting that we did rating difference increased (β = −191.6, p < 0.001; Figure 2D). examine our model over a large grid on the parameter space Evidence accumulation models such as aDDM explain this as it (Table 1). Our model simulation results did not strongly depend takes longer time to reach a decision threshold when the drift on the particular values of the parameters, and the behavior and rate is smaller. Similar to aDDM (and as we noted previously, fixation patterns in the Results section can be reproduced by a there is a fundamental equivalence of Bayesian approach and large proportion of parameters on the grid space. DDM Bitzer et al., 2014), our model interprets such behavioral Frontiers in Human Neuroscience | www.frontiersin.org 5 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making FIGURE 2 | The model predictions of the behavioral patterns. (A) the probability of choosing the left item as a function of the rating difference between the two items (left-right). Bars represent the experimental data (error bars represent 1 s.e.m across all participants); the black line represents the model simulation results; same for (B,D). (B) probability that the left item is chosen as a function of its total fixation duration advantage over the right item. (C) probability that the left item is chosen as a function of its rating advantage over the right item, conditioned on the last fixation. Yellow circles and the yellow line correspond to trials that participants looked at the left item in the last fixation; blue circles and the blue line correspond to trials that participants looked at the right item in the last fixation; red circles and the red line indicate the average of both. (D) reaction time as a function of absolute rating difference. pattern as fewer samples are needed to separate the two value the absolute value difference was smaller (β = −0.16, p distributions if the distance between them is larger. < 0.001; Figure 3B). The aDDM model in Krajbich et al. (2010) sidestepped this by sampling fixation durations from Eye-Movement Patterns separate empirical distributions conditioned on absolute rating Standard DDM approaches usually are agnostic about difference. In contrast, our model provides an intuitive and participants’ eye fixation patterns (but see Towal et al., 2013). natural explanation for this effect: as the task gets more For example, aDDM (Krajbich et al., 2010) sidestepped the difficult (ratings of two items are closer), more samples are mechanism of saccade and instead used the empirical fixation needed to separate the two underlying distributions. However, duration distribution as an input to the model to predict choice as more samples are taken from the fixated item, estimated behavior. Although standard DDMs predict the distribution uncertainty of the non-fixated item increases due to the of total reaction time to be an inverse Gaussian distribution forgetting effect. In order to make the choice with sufficient (note that Krajbich et al., 2010 used the log-normal distribution confidence, participants need to switch fixations between two to capture their empirical saccade fixation data, probably due items and evaluate them alternatively more often, resulting to the time-invariant noise term in the aDDM), they remain in more fixations in total. In brief, our model predicts that agnostic about the distribution of individual fixation duration decision time and the number of fixations are intricately linked (but see Towal et al., 2013). In contrast, our model speaks directly together. Indeed, our model simulation confirms this intuition: to participants’ saccade patterns as they are the intermediate the simulation data predict the inverse relationship between products between visual option inputs and the final behavioral the number of fixations and absolute value difference (line in choices. Indeed, these data provide a test bed for our framework Figure 3B). and future efforts that explicitly model the eye-movement Another finding that supports a proactive sampling model patterns. is the fact that middle fixation duration was not correlated As shown in Figure 3A, the overall distribution of the with item value itself (β = −5.91, p = 0.15; Figure 4A), but middle fixation duration is skewed toward right, which is negatively correlated with absolute rating difference (β = −32.3, qualitatively captured by our simulation results. One interesting p < 0.001; Figure 4B). Again, aDDM took this pattern as finding in Krajbich et al. (2010) was that the fixation number given and used fixation durations directly sampled from the increased as the choice became more difficult, that is, when empirical distribution conditioned on absolute rating difference Frontiers in Human Neuroscience | www.frontiersin.org 6 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making FIGURE 3 | (A) The histogram of middle fixation duration and the model fit. Bars and line represent the empirical distribution and the simulated distribution, respectively. The last bin contains all fixations longer than 3,000 ms. (B) average number of fixations per trial as a function of absolute rating difference. Bars represent the empirical data (error bars indicate 1 s.e.m. across participants); the line represents model simulation results. FIGURE 4 | Factors that influence fixation duration. (A) middle fixation duration as a function of the item rating. (B) middle fixation duration as a function of absolute rating difference between two items. (C) middle fixation duration as a function of the index of fixation (trials with only one fixation are excluded from this analysis). (D) fixation duration by type. Middle fixations indicate the fixations that are not the first nor the last fixation in a trial. Bars represent the empirical data (error bars indicate 1 s.e.m. across participants); lines represent the model simulation results. (Krajbich et al., 2010). In contrast, in our model, fixation switch Another worth-noting pattern about middle fixation duration is determined by the comparison of the uncertainties of value is that it increased steadily throughout a trial (β = 58.8, p = estimate of the two items, not the values themselves, so fixation 0.0018; Figure 4C). Since 96.9% of trials terminated within six duration does not vary as a function of individual item ratings fixations, we focus on only the second to the fifth fixations (line in Figure 4A). When the rating difference between two (excluding the first and last fixations). Our model is constructed items is large, the decision threshold is easy to surpass, even with such that the fixation switching probability depends on the ratio a small number of samples. As a result, it is easier for a long of uncertainties of the two value estimation distributions. Toward fixation (consisting of many samples) to lead to a final choice and the end of a trial, changes in the uncertainty ratio tends to therefore become the final fixation. Thus, larger rating difference decrease, rendering lower likelihood of fixation switch and thus corresponds to shorter middle fixation duration on average (line longer fixations in later part of the trial (line in Figure 4C). It was in Figure 4B). observed in Krajbich et al. (2010) that the first fixation of a trial Frontiers in Human Neuroscience | www.frontiersin.org 7 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making was shorter than middle fixations [paired t test, t (38) = −9.33, engender an explicit choice. Later studies using electrophysiology p < 0.001; Figure 4D] and they set up two separate empirical mapped the integration function to neural activities in brain distributions from which the model sampled first and middle areas such as lateral intraparietal cortex (LIP) and frontal eye fixations respectively. Our model predicts this pattern (line in field (FEF) (Platt and Glimcher, 1999; Ditterich et al., 2003; Gold Figure 4D) and sees it as a special case of the fact that fixation and Shadlen, 2007). Recently, such a theoretical approach has duration increases within a trial. The reversal pattern of the last been adopted to study value-based decision where the typical fixation duration, however, is due to the “truncated” or premature setup involves options displayed at different locations of the middle fixations: decision process terminates when the threshold visual field and eye movement data were also recorded (Armel is reached despite whether current fixation would have continued et al., 2008; Krajbich et al., 2010, 2012; Krajbich and Rangel, otherwise, which makes this final fixation shorter than it could 2011; Cassey et al., 2013; Towal et al., 2013). In addition to have been. Both aDDM and our model predict this phenomenon. speed and accuracy data, the newly acquired saccade information provides a novel venue to understand the underlying decision- Relationship Between κ and λ making mechanism. Indeed, it has been proposed that the choice It might seem arbitrary to introduce both the noise ratio κ and preference can be driven by the fixation duration on certain variance expansion factor λ in our model. At first glance, both option due to the asymmetric evidence accumulation between parameters can lead to the seemingly equivalent “expanding” fixated and non-fixated options, probably caused by attentional effect on uncertainty over the non-fixated item. However, even bias (Krajbich et al., 2010, 2012; Krajbich and Rangel, 2011; Towal though the noise ratio κ leads to a noisier sampling distribution et al., 2013; Tavares et al., 2017). Furthermore, the disruption of of the non-fixated item compared to the fixated one, getting such fixation leads to biased moral and value decisions (Armel new samples still helps making value estimate more accurate et al., 2008; Krajbich et al., 2010; Pärnamets et al., 2015). However, over time; in contrast, the variance expansion factor λ makes the eye-tracking data also pose a theoretical challenge: what value estimate more uncertain over time. A closer examination drives the eye fixation in such tasks? Inspired by the optimal of the fixation data reveals the necessity of both parameters: sampling theory, in this work, we presented a Bayesian generative the relationship between the probability of committing a choice model for eye-movement in a value-based binary choice task and the duration of fixation was modulated by the number of (Yuille and Bülthoff, 1996; Summerfield and Tsetsos, 2012; Bitzer fixations (Figure 5A). In a model without the variance expansion et al., 2014). The model fits well to the participants’ choices, as effect (λ = 1), the participant will be more likely to commit a well as the choice biases induced by fixation and the effect of choice when spending more time sampling from items. However, decision difficulty. More importantly, it makes novel and testable the probability of making an explicit choice decreased as the predictions of the fixation duration distribution and fixation fixation duration increased when fixation number is big (>2). patterns as functions of option attractiveness ratings and the The introduction of λ, due to its exponential form, creates the index of fixation, some of which were reported in Krajbich et al. competition between the exponentially expanding (expansion) (2010) and others are newly identified in the current work. and hyperbolic updating (contraction) of non-fixated item Eye movement has been reported to be causally linked with variance. An interesting derivation of this antagonism is that the valuation and choice generation in value-based decision making competition results depend on the fixation number because of the (Armel et al., 2008; Krajbich et al., 2010; Krajbich and Rangel, different forms of expansion and contraction functions. Indeed, 2011), but formal theories explaining why and how people make our model simulation captures this dependence (Figure 5B), eye movements during such decisions are lacking. The fact providing additional evidence that variance expansion, or the that our model is capable of explaining reaction time, choice forgetting process is necessary to explain the fixation data. and eye fixation data indicates that people might not passively Additionally, when κ = 1, the uncertainty of both items will accumulate value or perceptual information as standard DDM be the same throughout a trial, leading to a constant uncertainty suggests; instead, they actively switch their fixations to maximize ratio and hence stable switch probability. If that is the case, information gain before committing to a choice decision. Similar the middle fixation duration will be approximately the same concepts such as Infomax algorithm have been introduced before throughout a trial. However, as shown in Figure 4C, the middle in perception decision making and the research area of artificial fixation duration increased as the trial proceeded, providing extra intelligence (Butko and Movellan, 2010). evidence for the necessity of the noise ratio parameter κ. For simplicity, we omitted physiological details that might constrain the physical speed of information processing and eye DISCUSSION repositioning cost. For example, it has been reported that the activity delay between retina and the FEF in awake monkey is 75 The evidence accumulation model has witnessed its great success ± 10 ms (assuming a Gaussian distribution), FEF and saccade 30 in the past decades to account for the choice and reaction time ± 10 ms, and LIP and saccade 90 ± 10 ms (Wurtz and Goldberg, data in the field of perceptual decision making (Ratcliff, 1978; 1972; Schmolesky et al., 1998; Towal et al., 2013). Instead, we Bogacz et al., 2006; Bogacz, 2007; Gold and Shadlen, 2007). In a assume a rather crude individual uniform sampling interval typical experimental setup where stimuli (e.g., randomly moving between 50 and 150 ms, and set repositioning cost as a free dots) are presented together, the model predicts that participants parameter in the logistic function (as behavioral costs in Ahmad appraise stimuli passively until the evidence accumulation of et al., 2014). Surprisingly, our model is able to capture various certain decision variable reaches a (predefined) threshold to aspects of participants’ data despite the simplified assumption Frontiers in Human Neuroscience | www.frontiersin.org 8 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making FIGURE 5 | The proportion of fixations being the last of a trial, as the function of fixation duration and the index of fixation. (A) data. (B) model simulation. above, proving the robustness of such a quantitative approach. perceptual and economic decision-making tasks, has provided an Of note is the discrepancy between our model predictions and exciting testbed for candidate decision theories that emphasize fixation data in Figure 5, where the model overestimates the the interplay between eye-movement and choice selection. Our probability that a trial terminates over only one fixation (blue model is among the first to provide a unified framework curves). The first fixation is unique since our model assumes that to account for different levels of complexities in the fixation participants are able to sample from both options, irrespective pattern data and can be easily extended to multiple option of current fixation location, potentially due to rumination paradigms. and endogenous attention. However, during first fixation, it is impossible for participants to ruminate on an option they have DATA AVAILABILITY STATEMENT not yet observed. So, it is plausible that the cognitive mechanism of sampling from the non-fixated item can be inherently different The data analyzed in this study was obtained from Drs. Ian during the first fixation (before the participant has the chance Krajbich, Carrie Armel, and Antonio Rangel. Requests to access to look at the alternative item for the first time), compared these datasets should be directed to these authors. to later fixations. We decide to keep the model simple and concise such that it can be generalized to other decision AUTHOR CONTRIBUTIONS contexts. A few recent studies also examined the eye fixation pattern in MS, XW, HZ, and JL conceived the concept. MS and XW value-based decisions (Cassey et al., 2013; Towal et al., 2013). For performed the analysis. MS, XW, HZ, and JL wrote the example, Towal et al. (2013) suggested that the combination of manuscript. visual salience and value of different options drives the fixation switch, which further helps shape participant’s actual choice. FUNDING Item values are therefore used twice in predicting choice. This view is in contrast with earlier research that advocated the This work was supported by National Natural Science reverse causality between fixation duration and value difference Foundation of China grants: 31371019 (JL), 31871140 (JL), (Krajbich et al., 2010). Our model challenges this view and instead and 31571117 (HZ). proposes that eye fixation switch acts as an active information gathering process by comparing the levels of uncertainties ACKNOWLEDGMENTS between two estimated value distributions. The newly added dimension of fixation pattern data, in The authors would like to thank Drs. Ian Krajbich, Carrie Armel, addition to the traditional speed and accuracy information in and Antonio Rangel for sharing their dataset. REFERENCES Bitzer, S., Park, H., Blankenburg, F., and Kiebel, S. J. (2014). Perceptual decision making: drift-diffusion model is equivalent to a Bayesian model. Front. Hum. Ahmad, S., Huang, H., and Yu, A. J. (2014). Cost-sensitive Bayesian Neurosci. 8:102. doi: 10.3389/fnhum.2014.00102 control policy in human active sensing. Front. Hum. Neurosci. 8:955. Bogacz, R. (2007). Optimal decision-making theories: linking neurobiology doi: 10.3389/fnhum.2014.00955 with behaviour. Trends Cogn. Sci. 11, 118. doi: 10.1016/j.tics.2006. Armel, K., Beaumel, A., and Rangel, A. (2008). Biasing simple choices 12.006 by manipulating relative visual attention. Judg. Decis. Making 3, Bogacz, R., Brown, E., Moehlis, J., Holmes, P., and Cohen, J. D. (2006). 396–403. The physics of optimal decision making: a formal analysis of models of Frontiers in Human Neuroscience | www.frontiersin.org 9 February 2019 | Volume 13 | Article 35 Song et al. Proactive Information Sampling in Value-Based Decision-Making performance in two-alternative forced-choice tasks. Psychol. Rev. 113, 700–765. Najemnik, J., and Geisler, W. S. (2005). Optimal eye movement strategies in visual doi: 10.1037/0033-295X.113.4.700 search. Nature 434, 387–391. doi: 10.1038/nature03390 Bogacz, R., McClure, S. M., Li, J., Cohen, J. D., and Montague, P. R. (2007). Short- Pärnamets, P., Johansson, P., Hall, L., Balkenius, C., Spivey, M. J., and Richardson, term memory traces for action bias in human reinforcement learning. Brain D. C. (2015). Biasing moral decisions by exploiting the dynamics of eye gaze. Res. 1153, 111–121. doi: 10.1016/j.brainres.2007.03.057 Proc. Natl. Acad. Sci. U.S.A. 112, 4170–4175. doi: 10.1073/pnas.1415250112 Bornstein, A. M., Khaw, M. W., Shohamy, D., and Daw, N. D. (2017). Reminders Platt, M. L., and Glimcher, P. W. (1999). Neural correlates of decision variables in of past choices bias decisions for reward in humans. Nat. Commun. 8:15958. parietal cortex. Nature 400, 233–238. doi: 10.1038/22268 doi: 10.1038/ncomms15958 Ratcliff, R. (1978). A theory of memory retrieval. Psychol. Rev. 85, 59–108. Butko, N. J., and Movellan, J. R. (2010). Infomax control of eye movements. IEEE doi: 10.1037/0033-295X.85.2.59 Trans. Auton. Ment. Dev. 2, 91–107. doi: 10.1109/TAMD.2010.2051029 Ruff, C. C., and Fehr, E. (2014). The neurobiology of rewards and values in social Cassey, T. C., Evens, D. R., Bogacz, R., Marshall, J. A. R., and Ludwig, C. J. H. decision making. Nat. Rev. Neurosci. 15, 549–562. doi: 10.1038/nrn3776 (2013). Adaptive sampling of information in perceptual decision-making. PLoS Schmolesky, M. T., Wang, Y., Hanes, D. P., Thompson, K. G., Leutgeb, S., ONE 8:e78993. doi: 10.1371/journal.pone.0078993 Schall, J. D., et al. (1998). Signal timing across the macaque visual system. J. De Martino, B., Fleming, S. M., Garrett, N., and Dolan, R. J. (2012). Confidence in Neurophysiol. 79, 3272–3278. doi: 10.1152/jn.1998.79.6.3272 value-based choice. Nat. Neurosci. 16, 105–110. doi: 10.1038/nn.3279 Shadlen, M. N., Britten, K. H., Newsome, W. T., and Movshon, J. A. (1996). A Ditterich, J., Mazurek, M. E., and Shadlen, M. N. (2003). Microstimulation of computational analysis of the relationship between neuronal and behavioral visual cortex affects the speed of perceptual decisions. Nat. Neurosci. 6, 891–898. responses to visual motion. J. Neurosci. 16, 1486–1510. doi: 10.1038/nn1094 Summerfield, C., and Tsetsos, K. (2012). Building bridges between perceptual Gegenfurtner, K. R., and Sperling, G. (1993). Information transfer in iconic and economic decision-making: neural and computational mechanisms. Front. memory experiments. J. Exp. Psychol. Hum. Percept. Perform. 19, 845–866. Neurosci. 6:70. doi: 10.3389/fnins.2012.00070 doi: 10.1037/0096-1523.19.4.845 Tajima, S., Drugowitsch, J., and Pouget, A. (2016). Optimal policy for value-based Gold, J. I., and Shadlen, M. N. (2002). Banburismus and the brain: decoding decision-making. Nat. Commun. 7:12400. doi: 10.1038/ncomms12400 the relationship between sensory stimuli, decisions, and reward. Neuron 36, Tavares, G., Perona, P., and Rangel, A. (2017). The attentional drift diffusion 299–308. doi: 10.1016/S0896-6273(02)00971-6 model of simple perceptual decision-making. Front. Neurosci. 11:468. Gold, J. I., and Shadlen, M. N. (2007). The neural basis of decision making. doi: 10.3389/fnins.2017.00468 Annu. Rev. Neurosci. 30, 535–574. doi: 10.1146/annurev.neuro.29.051605. Towal, R. B., Mormann, M., and Koch, C. (2013). Simultaneous modeling of visual 113038 saliency and value computation improves predictions of economic choice. Proc. Kahneman, D., and Tversky, A. (1979). Prospect theory: an analysis of decisions Natl. Acad. Sci. U.S.A. 110, E3858–E3867. doi: 10.1073/pnas.1304429110 under risk. Econometrica 47, 263–291. Usher, M., and McClelland, J. L. (2001). The time course of perceptual choice. Kirchner, H., and Thorpe, S. J. (2006). Ultra-rapid object detection with saccadic Psychol. Rev. 108, 550–592. doi: 10.1037/0033-295X.108.3.550 eye movements: visual processing speed revisited. Vis. Res. 46, 1762–1776. Wurtz, R. H., and Goldberg, M. E. (1972). Activity of superior colliculus in doi: 10.1016/j.visres.2005.10.002 behaving monkey. 3. Cells discharging before eye movements. J. Neurophysiol. Krajbich, I., Armel, C., and Rangel, A. (2010). Visual fixations and the computation 35, 575–586. doi: 10.1152/jn.1972.35.4.575 and comparison of value in simple choice. Nat. Neurosci. 13, 1292–1298. Yuille, A. L., and Bülthoff, H. H. (1996). “Bayesian decision theory and doi: 10.1038/nn.2635 psychophysics,” in Perception as Bayesian Inference, eds D. C. Knill and Krajbich, I., Lu, D., Camerer, C., and Rangel, A. (2012). The attentional drift- W. Richards (New York, NY: Cambridge University Press), 123–162. diffusion model extends to simple purchasing decisions. Front. Psychol. 3:193. doi: 10.1017/CBO9780511984037.006 doi: 10.3389/fpsyg.2012.00193 Krajbich, I., and Rangel, A. (2011). Multialternative drift-diffusion model predicts Conflict of Interest Statement: The authors declare that the research was the relationship between visual fixations and choice in value-based decisions. conducted in the absence of any commercial or financial relationships that could Proc. Natl. Acad. Sci. U.S.A. 108, 13852–13857. doi: 10.1073/pnas.1101328108 be construed as a potential conflict of interest. Lee, P. M. (2012). Bayesian Statistics: An Introduction. 4th Edn. Chichester; West Sussex: Wiley. Copyright © 2019 Song, Wang, Zhang and Li. This is an open-access article Levy, D. J., and Glimcher, P. W. (2012). The root of all value: a neural distributed under the terms of the Creative Commons Attribution License (CC BY). common currency for choice. Curr. Opin. Neurobiol. 22, 1027–1038. The use, distribution or reproduction in other forums is permitted, provided the doi: 10.1016/j.conb.2012.06.001 original author(s) and the copyright owner(s) are credited and that the original McGinty, V. B., Rangel, A., and Newsome, W. T. (2016). Orbitofrontal cortex value publication in this journal is cited, in accordance with accepted academic practice. signals depend on fixation location during free viewing. Neuron 90, 1299–1311. No use, distribution or reproduction is permitted which does not comply with these doi: 10.1016/j.neuron.2016.04.045 terms. Frontiers in Human Neuroscience | www.frontiersin.org 10 February 2019 | Volume 13 | Article 35