15-387/86-375/675 Computational Perception
Carnegie Mellon University
Fall 2025
Course Description
The perceptual capabilities of even the simplest biological organisms are far beyond what we can achieve with machines. Whether you look at sensitivity, robustness, adaptability and generalizability, perception in biology just works, and works in complex, ever changing environments, and can make inference on the most subtle sensory patterns. Is it the neural hardware? Does the brain use a fundamentally different algorithm? What can we learn from biological systems and human perception?
In this course, we will study the biological and psychological data of biological perceptual systems, mostly the visual system, in depth, and then apply computational thinking to investigate the principles and mechanisms underlying natural perception.
You will learn how to reason scientifically and computationally about problems and issues in perception, how to extract the essential computational properties of those abstract ideas, and finally how to convert these into explicit mathematical models and computational algorithms. The course is targeted to students in any discipline who have some computing background but are interested in perception and neuroscience, and computational vision. The course will use Pytorch and CoLAB to do programming assignment. Prerequisites: First year college calculus, basic knowledge in differential equations, linear algebra, basic probability theory and statistical inference, and Python programming experience.
Course Information
Instructors
Office Hours.
Email (Phone)
Tai Sing Lee (Professor)
Friday 9:00 am. Zoom Office Hour
taislee@andrew.cmu.edu
Aida Mirebrahimi Tafreshi (TA)
Monday 7:00-8:00 p.m. on zoom
amirebra@andrew.cmu.edu
Yue Li (TA)
Tuesday 8:00-9:00 p.m. on zoom
yueli4@andrew.cmu.edu
All Office Hours and recitation will be held on zoom, using course zoom link unless notified and arranged otherwise
Class location and time:
WEH 4708. Monday/Wednesday 12:30 p.m - 1:50 p.m.
Class recitation and journal club:
Class Zoom. Friday 12:30 p.m - 1:50 p.m.
Website:
Canvas:
Lecture materials and Information would be on Canvas.
Recommended Textbook
Handouts on Canvas .
Frisby and Stone
Seeing: The computational approach to biological vision
. MIT Press, 2010 (recommended).
Classroom Etiquette
You can use laptop or cell phone to take notes during class but not to do anything else.
Grading Scheme 15-387/86-375
Evaluation
Grade Points
Assignments
60
Midterm
10
Final Exam
20
Class Participation
10
Total points: 100
Grading scheme: A: > 88, B: > 75. C: > 65. D > 50.
Grading Scheme 86-675
Evaluation
Points
Assignments
60
Midterm
10
Final Exam
20
Journal Club *
option
Term Project *
option.
Class participation
10
Total credit for 86-675: 100
Journal Club at least 80 percent attendance, 1-2 presentations.
Term project or term paper can be used to replace the journal club.
Grading scheme: A > 88, B: > 75. C: > 65
Homework
There will be 4-5 homework assignments involving Pytorch. The focus is on performing
experiments and analysis on existing implementation of perceptual model rather than
coding algorithms from scratch.
Each student will have 7 days (or more precisely, 7 x 24 hours) grace period for late homework. This grace period
can be used for one or multiple assignments. Use it wisely and
you cannot ask for more.
CANVAS submission time relative to deadline will be used to track.
You are allowed to have one partner to collaborate on the first homework to help you jumpstart your learning of Pytorch. But you are expected to do your homework by yourself in subsequent homeworks. However, you are welcome to ask questions in Piazza. Students who help others by answering questions in Piazza can earn credits in the course, up to 3 points (gold medal 3 pt, silver medal 2 pts, and bronze medals 1 pt).
AI Policy: You are free to use ChatGPT or similiar LLM to help you with your homeworks. But include a mandatory section at the beginning of each assignment, describing how ChatGPT has been used in the assignment in as much details as reasonable.
Term Project or Journal Club
Grad version requires presentation in Journal club (1-2 presentations) and attendnace at least 80 percent journal clubs. If you can't attend the journal club for schedule conflict, you can use a term project
or term paper to substitute.
Project proposal is due one week after the Midterm, but students
are encouraged to discuss project ideas with the professor
earlier on in the semester.
Examinations and Class Participation
There will be a midterm (10 points) and a final exam (20 points) to test materials covered
in the lectures, assigned papers and homework assignments.
We will dedicate some class time for discussion of the topics covered. This discussion requires some reading. Participation is required for the attendance points.
Syllabus
Date
Lecture Topic
Assignments
SENSORY CODING
M 8/25
1. Introduction
W 8/27
2. Perceputal Theories
F 8/29
Journal Club Orientation
M 9/1
Label Day (no class)
W 9/3
3. Retina
Homework 1 out
F 9/5
PyTorch Tutorial and HW1 Recitation
M 9/8
4. Computation
W 9/10
5. Pyramid
F 9/13
Journal Club 1
M 9/15
6. Frequency
W 9/17
7. Intrinsic Images
HW1 due; Homework 2 out
F 9/19
Recitation for HW2
PERCEPTUAL INFERENCE
M 9/22
8. Retinex
W 9/24
9. Networks
F 9/26
Journal Club 2
M 9/29
10. Cortex
Mid-Course Evaluation
W 10/1
11. Grouping
HW2 in. Homework 3 out
F 10/3
Recitation for HW 3
M 10/6
12. Texture
W 10/8
Midterm
F 10/10
Journal Club 3
HW2 due. HW3 out
M 10/13
Fall break
W 10/15
Fall break
F 10/17
Fall break
M 10/20
13. Metamers
W 10/22
14. Autoencoders
Midterm Grade due;
F 10/24
Journal Club 4
M 10/27
15. Surfaces
W 10/29
16. Contours
HW 3 in. HW4 out
F 10/31
Recitation HW4
M 11/3
17. Shapes
W 11/5
18. Objects
F 11/7
Journal Club 5
M 11/10
19. Scenes
W 11/12
20. Depth and Motion
HW 4 in, HW5 out;
F 11/14
Recitation for HW5
M 11/17
21. Synthesis
W 11/19
22. Attention
F 11/21
Journal Club 6
M 11/24
23. Integration
W 11/26
Thanksgiving
HW 5 in
F 11/28
Thanksgiving
M 12/1
23 Review / Presentation
W 12/3
24. Final Exam
Term Paper in.
F 12/5
Journal Club 7
Last day of Class
Journal Club
Week 1 Neural Manifolds
Reading (relevant, but optional reading)
Week 1 (Lectures 1 and 2) Observations, Theories and Computational Philosophy
Week 2,3 (Lectures 3, 4, 5, 6) Retina, Resolution, Pyramid and Computation
Visual perception starts with the eyes and the photoreceptors. However,
there is already sophisticated computation in the
retina. We will read some classic and the modern papers on retinal processing, cover some basic background on
frequency analysis, pyramid representation, as well as the current computational approach (via deep learning) for modeling
retinal processing. We will do a problem set on retinal processing, and explore its relationship to some visual illusion
and perception.
Lettvin, Maturana, McCulloch and Pitt. (1959) What the frog's eye tells the frog's brain
Proceedings of the IRE
1940-1959 .
Gollisch and Meister (2010) Eye Smarter than Scientists Believed:
Neural Computations in Circuits of the Retina
Neuron
65: 151-164.
Maheswaranthan, .... Ganguli and Baccus (2018) Deep learning models reveal internal structure and diverse computations in the retina under natural scenes
bioRxiv, June 8, 2018.
McIntosh, Maheswaranathan, Nayebi, Ganguli and Baccus (2016) Deep Learning Models of the Retinal Response to Natural Scenes
NIPS
Burt and Adelson (1983) The Laplacian Pyramid as a Compact Image Code.
IEEE Transaction on Communications. Com-31, no 4. 532-540
Week 4 (Lecture 7,8) Lightness perception and Intrinsic Images
Our perception of brightness (or lightness) and color is not determined by what are sensed by the retina, but in fact an interpretation of the ligthness and color properties of the object surfaces in the world. We will explore the classic theory of retinex as well as modern computational theory of intrinsic images for understanding lightness and color perception, culminating in a problem set on these issues.
Adelson, Ed, (2000) Lightness Perception and Lightness Illusion
The New Cognitive Neuroscience, Gazzaniga ed. MIT Press.
(2000).
Land, E, (1977) The retinex theory of color vision
Scientific America
1977
Horn, B, (1974) Determininng lightness from an image
Computer Graphics and Image Procwssing.
1974
Grosse, Johnson, Adelson and Freeman (2009) Ground truth dataset and baseline evaluations for intrinsic image algorithms
IEEE Trans Image Process.
19(11) 2825-37.
Tappen, Freeman and Adelson (2005) Recovering intrinsic images from a single image
IEEE PAMI.
27(9): 1459-1472.
Michael Janner, Jiajun Wu, Tejas D. Kulkarni, Ilker Yildirim, Joshua B. Tenenbaum (2017) Self-Supervised Intrinsic Image Decomposition.
NeurIPS.
Wei-Chiu Ma, Hang Chu, Bolei Zhou,
Raquel Urtasun and Antonio Torralba1 (2018) Single Image Intrinsic Decomposition wihtout a Single Intrinsic Image.
ECCV
Week 5 (Lecture 9,10). Surfaces, Shapes and Visual Cortex
Shading as well as many other cues allow us to infer 3D shapes. Intrinsic shading image is a consequence of illumination on 3D shape, suggesting one might decompose shading into illumination and 3D shapes. We wonder whether how 3D shapes or objects are represented in the brain. Are they in the form of 2D images, 2.5D sketches or 3D solids?
Ramaujan Srinath, A. Emonds, O. Wang, A. Lempel, E. Dunn-Weiss, CE, Connor, K. Nielsen
Early Emergence of Solid Shape Coding in Natural
and Deep Network Vision.
Current Biology, 31, 51-65. 2021.
S Y Edelman, M J Tarr on "How are three-dimensional objects represented in the brain?"
Cerebral Cortex
1995 May-Jun;5(3):247-60.
Zhang, X, Zhang, Z, Zhang C, Tenenbaum J, Freeman W, Wu, J. Learning to Reconstruct Shapes from Unseen Classes
NeurIPS 2018
Week 6, 7 (Lecture 11, 12, 13). Perceptual Learning and Inference
Week 8,9 (Lectures 14, 15, 16, 17) Perceptual Organization and Segmentation
In addition to inferring "visible" physical properties of the world, the brain also tries to organize the sensory information into parsimonious descriptions to infer more abstract and global properties or summary statistics of the world such as boundary, and surface properties or summary statistics.
In this segment of the course, we will study Gestalt school of thoughts and models for extracting these properties.
Bela Julez (1981) Textons, the elements of texture perception and their interaction.
Nature
290. 91-97.
Heeger and Bergen (1995) Pyramid-based texture analysis/synthesis
SIGGRAPH 1995
Portilla and Simoncelli (2000) A parametric texture model based on joint statsitics of complex wavelet coefficients
Internal journal of computer vision
40(1), 49-71.
L. A. Gatys, A. S. Ecker, and M. Bethge (2015) Texture Synthesis Using Convolutional Neural Networks
NIPS 28
L. A. Gatys, A. S. Ecker, and M. Bethge (2016)
Image Style Transfer Using Convolutional Neural Networks
CVPR 2016
Freeman J, Simoncelli EP. (2011) Metamers of the ventral stream. Nature Neuroscience.
Kovacs, I., Papathomas, T., Yang, M. and Feher, A. (1996). When the brain changes its mind: Interocular grouping during
binocular rivalry. Proceedings of the National Academy of Sciences, 93(26), pp.15508-15511.
Lee, T.S. (1995). A Bayesian framework for understanding texture segmentation in the primary visual
cortex.
Vision Research
35, 2643-2657..
Tsao and Tsao (2022) A topological solution to object segmentation and tracking.
PNAS
119(41).
Week 10. (Lecture 18, 19) Objects, Scenes and Inverse Graphics
Early computer vision approach emphasized on analysis by synthesis. This framework can be generalized to conceptualize the recurrent interaction in the hierarchical visual system and perception as inverse graphics. We will study these theories and their neural foundation, as well as to see how these principles can be extended to self-supervised learning based on prediction principles.
Van Essen, Anderson and Felleman (1992) Information processing in primate visual systems: an integrated approach
Science
5043: 419-423.
Mumford, D (1992) On the computational architecture of the neocortex
Biological Cybern.
66: 241-251.
Torralba and Oliva
(2003) Statistics of natural image categories
Network: Comptuations in Neural Systems
14: 391-412.
Rao and Ballard (1998) Predictive coding in the visual cortex: a functional interpretation of some oextra-classical receptive field effects
Nature Neuroscience
2(1), 79-87.
Lee and Mumford (2003) Hierarchical Bayesian inference in the visual system
J. Optical Society of America
20(7), 1434-1448.
Lee, T.S. (2015) The Visual System's Internal Models of the World
Proceedings of the IEEE
Vol 103, issue 8, 1359-1378.
Lotter, Krieman and Cox (2020) A neural network trained for prediction mimics diverse features of bioloigcal neurons and perception
Nature Machine intelligence
vol 2, 210-219.
Ilker Yildirim, Mario Belledonne, Winrich Freiwald, Josh Tenenbaum (2020) Efficient inverse graphics in biological face processing
Sci. Adv. 2020; 6 : eaax5979 4 March 2020
T. D. Kulkarni, W. F. Whitney, P. kohli, J. Tenenbaum, Deep convolutional inverse graphics network, in Proceeding of the Advances in Neural Information Processing Systems (NIPS, 2015), pp. 2539–2547
Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ. Evidence that recurrent circuits are critical to the ventral stream's execution of core object recognition behavior. Nature Neuroscience. 2019. doi: 10.1038/s41593-019-0392-5.
Week 11 (Lecture 21, 22) Integration and Composition
Compositionary Theory argues the brain or the visual system is organized in a recursive nearly decompositional system that best models the compositional nature of parts, objects and scenes in the natural world, their flexible combination and deformation. We will explore some classical and modern theories on composition, exploring the linkage between modern deep neural networks and symbolic AI from this perspective, and the connection between language and vision.
E. Bienenstock, S. Geman, and D. Potter. Compositionality, MDL Priors, and Object Recognition
NIPS Advances in Neural Information Processing Systems 9.
1998.
S. Geman, Hierarchy in machine and natural vision.
Proceedings of the 11th Scandinavian Conference on Image Analysis,
1999.
Yuille. Towards a Theory of Compositional Learning and Encoding of Objects
1st IEEE Workshop in Information Theory in Computer Vision and Pattern Recognition. ICCV
2011.
S.C. Zhu and D. Mumford (2006) A Stochastic Grammar of Images
Foundations and Trends in Computer Graphics and Vision
2(4): 259-362.
Week 12 (Lecture 23, 24) Functional Streams Coordination and Attention
The visual system operates in multiple functional streams, with ventral stream for object recognition, dorsal stream for processing space, motion and action, and lateral stream involved in social interaction. Furthermore, the different sensory systems such as audition, tactile and vestibular systems will also need to integrate with vision to create a coherent perception of the world and coordinate with motor systems, decision and planning. We will explore theoretical frameworks for achieving this synthesis and its relationship with generative models, attention and consciouness.
Grace Lindsay (2020) Attention in psychology, neuroscience and machine learning
Frotnier Computational Neuroscience
April 2020.
Luo and Maunsell (2019) Attention can be subdivided into neurobiological components corresponding to distinct behavioral effects
PNAS 116(52) 26187-26194.
Eric Knudsen (2018) Fundamental components of attention.
Annual Review of Neuroscience
Olshausen, Anderson and Van Essen (1995) Mutliscale dynamic routing circuit for forming size- and position-invariant object recognition
J. Neuroscience
2:45-62.
Sabour, S., Frosst, N. and G. Hinton (2017) Dynamic routing between capsules
NIPS
Gooddale, M.A. Lessons from human vision for robotic design
Springer Nature: Autonoous Intelligent Systems
Ayzenberg, V and Behrmann M. (2022) The Dorsal visual pathway represents object-centered spatial relations for object recognition
J. Neuroscience 42(23) 4693-4710.
Rao, RPN (2024) Active predictive coding -- a sensory-motor theory of the neocortex.
Nature Neuroscience 27: 1221-1235.
Additional Exploration: Art and Beauty
What is beauty? Is it just something in the eyes of the beholders, or is it something universal? In this final week of the class, we hope to explore the concepts and computational theories of beauty and art and how they might be related to art, and the computational principles underlying
perception that we have studied in the course.
Cavanagh, P. (2005) The artist as neuroscientist. Nature, 434, 301-307.
Perdreau, F. & Cavanagh, P. (2011). Do artists see their retinas? Frontiers in Human Neuroscience, 5:171
Bilge Sayim and Patrick Cavangah (2011) What line drawings reveal about the visual brain.
L. A. Gatys, A. S. Ecker, and M. Bethge (2017) Texture and art with deep neural networks
Current Opinions in Neurbiology 46, 178-186.
Schmidhuber Jurgen. (1997) Low complexity art.
Schmidhuber Jurgen. (2008) Driven by Compression Progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, and jokes!
Questions or comments:
contact
Tai Sing Lee
Last modified: August 2025, Tai Sing Lee