Simulating Conceptually-Guided Perceptual Learning Alexander Gerganov (

[email protected]

) Department of Cognitive Science and Psychology, New Bulgarian University 21 Montevideo St., 1618 Sofia, Bulgaria Maurice Grinberg (

[email protected]

) Department of Cognitive Science and Psychology, New Bulgarian University 21 Montevideo St., 1618 Sofia, Bulgaria Paul C. Quinn (

[email protected]

) Department of Psychology, University of Delaware Newark, DE 19716 USA Robert L. Goldstone (

[email protected]

) Department of Psychological and Brain Sciences, Indiana University 1101 East Tenth Street, Bloomington, IN 47405 USA Abstract different tasks at the same time. They should be able to Traditional models of perceptual learning usually assume that operate in the absence as well as in the presence of reward learning occurs through changes of weights to fixed primitive feedback. In addition, many of the models rely on a finite features or dimensions. A new model for perceptual learning number of fixed representations (primitives) as the is presented which relies on simple and physiologically elementary building blocks for composing concepts. Such plausible mechanisms. The model suggests how flexible accounts fall short of capturing the richness of visual pattern features can be dynamically derived from input characteristics learning phenomena. There is experimental evidence in the course of learning and how diagnostic shape suggesting that perceivers not only learn to selectively representations could be formed due to conceptual influences. weight existing dimensions, but also learn to isolate Keywords: perceptual learning, neural networks, dimensions that were originally psychologically fused categorization, concept learning. together (Goldstone & Steyvers, 2001), and reorganize visual inputs into new units (Behrmann, Zemel, & Mozer, Introduction 1998; Goldstone, 2000). Perceptual learning refers to performance improvement in In the present article, a neural network model is described different sensory tasks as a result of practice, training, or which relies on the physiologically plausible learning simple exposure. In the domain of visual perception, these mechanisms of competitive and Hebbian learning. The tasks range from simple detection and discrimination of model focuses on simulating results from task-dependent geometric shapes to more complex tasks like face perceptual learning, which may involve both a higher-level recognition and object categorization. One important conceptual influence and a lower-level perceptual question concerns the nature of the processes that lead to reorganization. Studies with adults show that perceptual perceptual learning. Evidence has been provided for a wide learning is influenced by the feedback presented to learners range of changes – from input based representation (Shiu & Pashler, 1992) but can also take place without modifications to influences of expectation, attention, or task. feedback (Watanabe, Náñez, & Sasaki, 2001). Experimental Because of the highly complex and intertwined interactions data from infants show also that perceptual learning can of different processes, a deliberate blurring of the boundary occur without feedback (Quinn, Schyns, & Goldstone, between concepts and percepts has been proposed 2006). Accordingly both supervised and unsupervised (Goldstone & Barsalou, 1998). There is a need for theories learning should be incorporated into a full model of and models that account for conceptual influences on environmentally induced perceptual plasticity. The model perceptual learning. for perceptual learning presented below is able to simulate Computational modeling is often used to simulate both influences. perceptual learning processes (e.g., Mozer, Zemel, Several simulations are reported that correspond to Behrmann, & Williams, 1992; Petrov, Dosher, & Lu, 2005; empirical results from behavioral studies. Finally, Poggio, Fahle, & Edelman, 1992). Modeling places conclusions are put forward about the way statistics from important constraints on explanations about perceptual visual patterns can lead to the building of flexible primitive learning and pushes theoretical accounts to be more features and how higher-level conceptual tasks can quantitative and concrete. Testable behavioral predictions influence the formation of complex shape representations. are often derived from simulations. Models of perceptual learning, however, rarely try to account for performance in The Model where L is the learning rate for the winning unit (0.1 for all simulations), M is the learning rate for the losing unit – it is The model for perceptual learning consists of two main layers and an artificial input retina (Figure 1). The first layer set to 0.001 for all simulations. I dj,k is the activation of the is based on the competitive learning paradigm (Rumelhart & retina pixel j from receptive field d when input k is Zipser, 1985). However units compete only for a small part of the input⎯that is, each unit has a receptive field and presented, and Wi ,dj is the weight between pixel j from competes only with other units with the same receptive receptive field d and competitive unit i. The stimuli are field. In the current implementation of the model there is no presented as activation patterns on the retina, where each overlap between receptive fields. Competing units are pixel is either 1 (active) or 0. Activation of competitive organized in inhibitory clusters⎯two units with the same units is normalized so that the winning unit’s activation is 1 receptive field cannot be active at the same time. Only the and all the losing units from the cluster sharing the same winner for this receptive field is active. A competitive unit receptive field are inhibited to have zero activation. The is connected with horizontal Hebbian weights to all units horizontal Hebbian weights learn according to the Hebbian from the other inhibitory clusters. The horizontal Hebbian rule: connections link the parts of an input pattern in terms of coactivation of the competitive units that are specialized to ΔWi ,dl , p = α Aid Alp − D , those parts. The activation of a competitive unit is computed in two time-steps according to the following equations: d where α is the learning rate, Ai is the activation of unit i n Aid, k (t ) = ∑ I dj ,kWi ,dj p from cluster d, Al is the activation of unit l from cluster p, j =1 and D is the decay rate of the weights. The competitive layer is fully connected to the output c s layer with Hebbian weights that learn according to the same Aid,k (t + 1) = Aid,k (t ) + η ∑ ∑ Wi ,dl , p Alp, k (t ) , rule as the horizontal connections, with the exception that p =1 l =1 they have different decay and learning rates. All Hebbian p≠d weights were set to zero in the beginning of a simulation. where Aid,k (t ) is the activation of unit i from cluster d in moment t when input pattern k is presented, I dj,k is the output layer … activation of input pixel j from receptive field d for pattern k, Wi ,dj is the weight of the connection between unit i and … p pixel j, A (t ) is the activation in moment t of competitive l ,k competitive competitive unit l from cluster p for pattern k, Wi ,dl , p is the weight of the cluster cluster horizontal connection between unit i from cluster d and unit l from cluster p, n is the number of pixels in receptive field … … d, s is the number of competitive units from cluster p, and c is the number of clusters. In the following simulations, s is … … the same for all clusters, that is, the number of competitive units in the different clusters is constant. The parameter η is 5x5 set to 0.1 and represents the smaller contribution of the receptive horizontal connections compared to the bottom-up field activation. The winner from each cluster is determined as the most active unit inside the cluster. The output units have 15x15 sigmoid activation functions. input Learning for the connections between an input receptive retina field and the competitive units from the corresponding inhibitory cluster follows the classical formula: { Figure 1: The model for perceptual learning. Only some of M ( I dj ,k −Wid, j ) if unit i loses on stimulus k d ΔWi , j = L ( I dj ,k −Wid, j ) if unit i wins on stimulus k , the connections are shown for visualization purposes. See the text for full details. The network learns after each pattern. The competitive layer When 3- to 4-month-olds were presented with visual corresponds to lower-level cells with small receptive fields patterns consisting of overlapping circle and polygon shapes that cover only small parts of an input, while the output (Figure 2A), the infants tended to interpret these forms in units correspond to more complex structures that are terms of a polygon and circle, consistent with a good thought to participate in higher-level cognitive tasks continuation principle. This was evidenced by infants being more surprised (looking longer) by a subsequently presented Simulations and Results pacman shape (Figure 2C) than a circle (Figure 2D). Two types of simulations are possible with the described However, when a separate group of 3- to 4-month-olds was model. The first type corresponds to learning without first presented with a series of patterns containing the three- feedback. In this operational mode, the output layer is quarter “pacman” shapes (Figure 2B), and then activated at random since no teacher signal is available. In subsequently with the patterns shown in Figure 2A, the other words, this is unsupervised learning of the competitive infants interpreted the ambiguous patterns in Figure 2A as layer, based only on the characteristics of the input space. containing a pacman instead of a circle, as evidenced by When feedback is available, a particular pattern of their greater looking times for the circle than the pacman. activation appears on the output layer as a teacher signal. These experimental results strongly suggest that This signal represents the influence of higher-level unsupervised learning is capable of overriding gestalt laws conceptual processes on learning. of organization such as good continuation if the prior learning history supports an alternative organization. Unsupervised Learning The model can provide a computational account for these empirical findings. The competitive layer is capable of The unsupervised learning of the competitive layer alone extracting elements and statistical dependencies from the was simulated with stimuli close to those used in Quinn and input structure even if no feedback is available. Thus the Schyns (2003) and Quinn et al. (2006). Using an gestalt law of continuity was simulated with presentation of unsupervised model to simulate empirical results from simple forms at different positions on the retina. Ten such infants seems like a natural correspondence given that patterns (three vertical lines, three horizontal lines, and four infants in the first few months of life do not receive circles) were presented in random order for 2000 cycles. instruction on how to organize their visual experiences. A This pre-training phase simulated the infant’s perceptual series of experiments were conducted to determine whether experience prior to arrival at the laboratory and conceivably infants, like adults, can perceive visual patterns in terms of corresponds with the experiences of young infants as they parts extracted through category learning rather than parts encounter visual patterns in the environment. We were that would be derived from adherence to gestalt interested in the ability of the model to acquire perceptual organizational principles. constraints from commonly occurring patterns instead of explicitly building in the good continuation principle. This could also be interpreted as the evolved representation of naturally occurring statistics in visual patterns (Olshausen & Field, 1996). The input retina consisted of 225 pixels organized in a 15x15 square matrix. There were 9 non-overlapping square 5x5 receptive fields with 8 units in an inhibitory cluster A competing over each of the receptive fields, which makes for a total of 72 nodes in the competitive layer. The learning rate of the horizontal Hebbian weights was 0.05 and the decay rate was set to 0.009. After the pre-training phase, some of the competitive units specialized for parts of lines, while others specialized for arcs of a circle. Then an ambiguous pattern (Figure 3A) was presented. This portion of network training and testing corresponded to the first B familiarization test phase in the study with infants, when similar patterns each consisting of an overlapping circle and a polygon were presented which led to the segmentation of the circle and the polygon by infants. The ambiguous pattern given to the model activated four “arc” and two “line” nodes from the competitive layer, thus forming a good, continuous circle and some parts of a polygon which C D was consistent with the infants’ behavior. The activation pattern over the competitive layer is visualized in Figure 3B Figure 2: Stimuli from Quinn and colleagues, 2006. with the following algorithm – each pixel represents the more active than the arc unit over the same receptive field, which led to the angle unit winning for this receptive field. This could be interpreted as a spontaneous formation of a virtual pacman shape detector that is constructed from smaller low-level representations of three arcs and one angle segment. A B Supervised Learning Supervised learning is often used in studies of adult perceptual learning and can influence the course of learning. Previous experiments (Pevtzow & Goldstone, 1994) have suggested that observers seem to develop perceptual detectors for stimulus elements that are diagnostic of task- critical categorization while they learn to categorize simple patterns. The same patterns, when they receive different C D categorizations, result in different psychological features being constructed. The nature of the detectors depends not only on the input patterns as in the previous simulation, but on the categorization task as well. As an example, the ambiguous scene in Figure 3A was more likely to be segmented into a circle and polygon when the circle was previously relevant for categorization, and more likely to be segmented into a pacman when the pacman was relevant. The experimental results from Pevtzow and Goldstone E F (1994) have been simulated with a model similar to the one presented here (Goldstone, 2000). The previous model Figure 3: Unsupervised learning simulation however relied on built-in perceptual constraints and input patterns competing to be accommodated by a competitive weight between this pixel and the competitive unit unit. The present model adds plausible Hebbian learning to multiplied by the competitive unit’s activation. This the competitive learning mechanism used in Goldstone visualization is intended to show that the competitive units (2000). The present model also uses more local competition were not activated accidentally but represented both the for small parts of an input inside a receptive field instead of structure of the presented pattern and a learned continuity competition for the whole input. This leads to a somewhat principle for a circle shape. The polygon shape triggered different interpretation of a detector – in the present model a only the activation of two separate line segments, because detector is composed of several smaller competitive units the network had never been exposed to any polygon shape from different receptive fields that form together a coherent and thus did not have the chance to acquire any polygon shape detector over the whole input retina. representation during its pre-training. This result shows that In the following simulations the formation of such the network does not simply imitate the presented pattern detectors was influenced not only by the input properties as but is affected by its previous knowledge about perceptual in the unsupervised learning but also by a conceptual grouping that has been stored in the horizontal connections. teacher signal that led to the formation of categorization- The same network was fed for 200 cycles with two relevant detectors at the output layer of the network. A patterns containing pacman shapes (Figure 3C, 3D) and teacher signal was directly presented as a pattern of again was presented with the ambiguous pattern 3E. This activation on the output layer during the supervised training. corresponded to the two-part procedure in which the infants This was done for simplicity since the influence of higher- were first presented with pacman shapes and subsequently level categorization or judgment structures can be simulated with circle shapes (2B followed by 2A). Once again the in different ways – one possible mechanism that was used model behavior was very similar to what the experimental by Goldstone (2000) was top-down influence from a results suggested. This time the pacman shape was strongly categorization layer to the detector layer. active and some polygon segmentation appeared but was A 256 square 16x16 pixel retina was used; competitive less active than the pacman (Figure 3F). The pacman shape units’ receptive fields were square 8x8 non-overlapping actually was represented by three competitive units matrices, which yielded a total of four receptive fields. Each specialized for arcs and one specialized for an angle. The inhibitory cluster consisted of 4 units competing with one “arc” units were initially connected to the fourth arc unit another. The output layer had two units. Learning rate for which completed the active circle from Figure 3B; however, the output Hebbian weights was set to 0.1 and the decay rate after the patterns containing the pacman shapes were was 0.04. The horizontal Hebbian connections had the same repeatedly shown to the network, the angle unit became learning and decay rates as in the previous simulation. output output A B unit 1 A unit 2 B C D C D Figure 5: Panel A – weights between the competitive layer and the two output nodes. Panel B – mean square error for Figure 4: Inputs for the categorization task simulation the output nodes. Panel C – the pixel-to-unit weights for the two competitive units with positive weights to output unit 1. Four input patterns were presented to the network (Figure Panel D – the pixel-to-unit weights for the two competitive 4). First, feedback was given to the network that 4A and 4B units with positive weights to output unit 2. belong to one category (1, 0) and 4C, 4D belong to another (0, 1). With this horizontal categorization rule, 50 cycles The results from the second simulation are compared to the were run with the four input patterns presented in a random outcomes of the first simulation in Figure 6. For order during each cycle. The mean squared error of the visualization purposes the output layer weights are output units displayed a rapid decrease (Figure 5B). The multiplied by the competitive layer weights, which represent network learned to distinguish 4A and 4B as members of the participation of each pixel in the diagnostic shape one category from 4C and 4D belonging to another. That is, detectors that were formed at the output layer. The same when 4A or 4B were presented, output unit 1 was active and patterns led to the formation of different detectors when the unit 2 was not. On the contrary, when 4C or 4D were vertical categorization rule was applied. This result was very presented, output unit 2 was active and unit 1 was off. The stable over simulations and replicated the type of results two output units can be considered detectors for the two reported by Pevtzow and Goldstone (1994). categories. The learned weights of the connections between the competitive layer and each of the two output units are shown on Figure 5A. Only two of the competitive units had positive weights to output unit 1 and the other two had positive weights to output unit 2. Thus the output units had learned to ignore the responses of those lower-level nodes that were not relevant for categorization and combined together those parts which were relevant, forming diagnostic shape detectors (Figure 5C, 5D). The formation of the detectors was not influenced by the number of lower-level competitive units that participated in the shape Horizontal categorization rule – AB, CD representation. The result was the same with smaller 4x4 receptive fields. This change led only to the same diagnostic shape detectors being composed of four instead of two competitive units. The competitive units participating in a detector’s representation were specialized for small input patterns contained within their receptive fields. The global representation activated by the whole input pattern, however, was a continuous shape honoring the Gestalt principle of Good Continuation. Vertical categorization rule – AC, BD In a second simulation, a vertical categorization rule was applied to a network with identical parameters. This time Figure 6: Detectors built according to a horizontal and patterns 4A and 4C were from the same category (1, 0) vertical categorization rule. while patterns 4B and 4D were from the other (0, 1). Inspection of all specialized competitive units showed that Twenty-second Annual Conference of the Cognitive there was no difference in their representation after the Science Society (pp. 172-177). Hillsdale, New Jersey: vertical and horizontal rule simulations. This means that the Lawrence Erlbaum Associates. general structure of the input space was captured every time Goldstone, R. L. (2000). Unitization during category by the competitive units. Correct categorization was due to learning. Journal of Experimental Psychology: Human the formation of a diagnostic shape detector at the output Perception and Performance, 26, 86-112. layer. Goldstone, R. L., & Barsalou, L. (1998). Reuniting perception and conception. Cognition, 65, 231-262 General Discussion Goldstone, R. L., & Steyvers, M. (2001). The sensitization The model shows a reliable ability to replicate at least two and differentiation of dimensions during category empirical results with minimal changes in parameters. Both learning. Journal of Experimental Psychology: General, unsupervised and supervised learning is possible. A general 130, 116-139. conclusion from the simulation results is that there are Mozer, M. C., Zemel, R. S., Behrmann, M., & Williams, C. automatic low-level changes that capture the structure of K. I. (1992). Learning to segment images using dynamic visual stimuli irrespective of the given task. However when feature binding. Neural Computation, 4, 650-665 feedback is available, a more complex shape representation Olshausen B. A., & Field D. J. (1996). Emergence of is constructed at a higher-level to accommodate the task simple-cell receptive field properties by learning a sparse requirements. code for natural images. Nature, 381, 607-609 Another interesting conclusion comes from the Petrov, A., Dosher, B., & Lu, Z.-L. (2005). The dynamics of unsupervised behavior of the network. The simple and perceptual learning: An incremental reweighting model. plausible mechanism of competitive learning, reinforced by Psychological Review, 112, 715-743. the horizontal Hebbian connections, is able to extract Pevtzow, R., & Goldstone, R. L. (1994). Categorization and perceptual categories that are statistically present in the the parsing of objects. Proceedings of the Sixteenth input space. This strongly supports empirical findings that Annual Conference of the Cognitive Science Society (pp. Gestalt principles of perceptual organization can at times be 717-722). Hillsdale, New Jersey: Lawrence Erlbaum overruled by category learning. The model also suggests a Associates. way in which even certain Gestalt principles like continuity Poggio, T., Fahle, M., & Edelman, S. (1992). Fast can be learned, rather than built-in, as a consequence of perceptual learning in visual hyperacuity. Science, 256, experience with a learning environment that includes visual 1018–1021. patterned stimulation (Quinn & Bhatt, 2005; Spelke, 1982). Quinn, P. C., & Bhatt, R. S. (2005). Learning perceptual The presented simulations have shown that it is organization in infancy. Psychological Science, 16, 515- computationally possible to account for both supervised and 519. unsupervised perceptual learning without using built-in Quinn, P. C., & Schyns, P. G. (2003). What goes up may primitive features at the level that is eventually diagnostic come down: Perceptual process and knowledge access in for categorization. This was achieved by a fairly simple the organization of complex visual patterns by young structure and by plausible mechanisms. The suggested infants. Cognitive Science, 27, 923-935. model for perceptual learning is a first step toward a more Quinn, P. C., Schyns, P. G., & Goldstone, R. L. (2006). The global approach to learning that intends to bring together interplay between perceptual organization and concepts and perception. categorization in the representation of complex visual patterns by young infants. Journal of Experimental Child Acknowledgments Psychology, 95, 117-127. Rumelhart, D. E., & Zipser, D. (1985). Feature discovery by This research was funded by NIH Grants HD-42451 and competitive learning. Cognitive Science, 9, 75-112. HD-46526 (to the third author), and Department of Shiu, L., & Pashler H. (1992). Improvement in line Education, Institute of Education Sciences grant orientation discrimination is retinally local but dependent R305H050116 and National Science Foundation grant on cognitive set. Perception and Psychophysics, 52, 582- 0527920 (to the fourth author). 88. Spelke, E. S. (1982). Perceptual knowledge of objects in References infancy. In J. Mehler, M. Garrett, & E. Walker (Eds.), Perspectives on mental representation (pp. 409-430). Behrmann, M., Zemel, R. S., & Mozer, M. C. (1998). Hillsdale, NJ: Erlbaum. Object-based attention and occlusion: Evidence from Watanabe, T., Náñez, J., & Sasaki, Y. (2001). Perceptual normal participants and a computational model. Journal learning without perception. Nature, 413, 844-848. of Experimental Psychology: Human Perception and Performance, 24, 1011-1036. Goldstone, R. L. (2000). A neural network model of concept-influenced segmentation. Proceedings of the