Everything you do involves interpreting the world around you. Your sensory apparatus gathers data that your brain combines with some innate capabilities, experience and memory, to provide you with a continuously updated hypothesis about your environment.
That hypothesis is perception.
Of all the senses, vision is preeminent. Most of us have lived with vision our entire lives and we have an intuitive understanding of how it works. A lot of those intuitions, however, may be wrong.
Vision is the mechanism that collects data about the physical environment by interpreting light.
Light is that very small section of the electromagnetic spectrum that is visible to human beings.
Visible light represents just 1% of the electromagnetic spectrum, ranging from about 400 to 700 nm with peak sensitivity at around 555 nm. Above and below that peak sensitivity, the human eye’s response to light falls off rapidly.
We sense light with our eyes, but we actually see with our brains.
First of all, the eye is not a camera. It doesn’t capture an image and send it to the brain. It collects light, does some processing and sends neural data to the visual cortex for some additional processing.
As an imaging system, the human eye is not very good, but as a data collection device, it is fairly spectacular. Consider dynamic range. The human visual system can manage approximately 20 stops[i] of dynamic range, but the human eye can only manage approximately 6.5 stops.
How can perception provide more than 20 stops of data we routinely perceive when the eye, the source of that data, only handles 6.5? The eye scans the environment and adapts very quickly to light and dark regions and, after some processing in the retina and the visual cortex, sends that data to the cerebral cortex for aggregation and synthesis. In other words, you can perceive more than your eye can “see.”
If your eye isn’t a camera, what is it and how does it work?
In this schematic of the human eye, light passes through the cornea, by the pupil, and is focused by the lens onto the retina, where it is detected by light sensing photoreceptor cells.
There are two types of photoreceptors in the human retina: rods and cones.
Rods are responsible for vision at low light levels. They do not differentiate between frequencies of light. That means they do not make any contribution to color perception. Essentially, they see the world in black and white. They have a low spatial acuity, but they are considerably more light-sensitive than cones.
Cones are active at higher light levels and are capable of spectral differentiation. Cones provide the data for the human visual system to synthesize color. They also have high spatial acuity.
There are three varieties of cone cells: short-wavelength sensitive cones (S), middle-wavelength sensitive cones (M) and long-wavelength sensitive cones (L). You may be tempted to associate short wavelength cones with the color blue, medium with green and long with red, but that is a mischaracterization. Your eye doesn’t “see” color. It collects data that the brain uses to create color.
About 64% of the cone cells in the human retina are sensitive to long wavelength light, 32% are sensitive to medium wavelength light and only about 4% are sensitive to short wavelengths.
Rods and cones are not evenly distributed across the retina. In fact, most of the cones are in the fovea. Here is a graphical look at the distribution of rods and cones along the retina.
This implies that you can only detect information for the perception of color in a radius of around 15 degrees of arc from the center of your field of view, which seems to contradict our direct experience of perceiving color right out to the edges of our field of view.
In 1817, Thomas Young posited that the human visual system perceives color by using three specific types of sensors designed to detect different wavelengths of light. In the middle of the 19th century, Hermann von Helmholtz expanded on this theory and demonstrated that color reproduction could be achieved using three primary colors.[ii] Today, we call the tristimulus theory of color the Young-Helmholtz trichromatic theory.
Whenever you see a full range of color presented through a system that uses an admixture of red, green and blue elements or look at a color print produced with cyan, magenta and yellow pigments, you are looking at imaging based on the Young-Helmholtz tristimulus theory.
Other observers, however, noticed that there were perceptual effects that seemed to suggest that the Young-Helmholtz theory wasn’t the end of the story. For instance, consider the question of cone sensitivities. You might expect each color to have its own, distinct range of response with some overlap to facilitate color-blending for the perception of intermediate hues.
In fact, that was the accepted theory until 1860, when James Clerk Maxwell described an instrument for producing and mixing monochromatic lights in defined proportions, and, with this instrument, made the first careful, quantitative measurements of the human perception of spectral phenomena.[iii]
The human visual system hasn’t undergone any significant upgrades between then and now. Using more recent data, below is a representation of cone fundamentals.
Looking at the above graph, it should be obvious that any spectral phenomenon that excites any of the cones excites all the others as well. The exception is that at the far end of the visible spectrum between about 665 and 700 nm, only long wavelength cones are stimulated. For the most part, however, all visible light excites all cones. How then is that data interpreted as color?
That distribution of frequency responses among the cone cells doesn’t seem to be a very good basis for Young-Helmholtz tristimulus color theory — or at least color perception seems not to work quite the way Thomas Young and Hermann von Helmholtz thought it did.
Ewald Hering, a German physiologist, did a lot of research into vision and perception. Hering speculated that perhaps the human visual system processes light in a way that is a bit more involved than a simple mixing of inputs from the three photoreceptors. He suggested that the human visual system processes input stimulus in an antagonistic manner with color values in opposition.[iv] This is called opponent color theory, and it is the basis for all color models and color spaces that attempt to model human perception.
The theory establishes three coordinates: a coordinate with red on one end and its opposite, green on the other; a coordinate with blue opposed to yellow; and a brightness coordinate that goes from completely dark to fully white.
Light that strikes the retina is processed by photoreceptive cells (rods and cones) just as Young-Helmholtz tristimulus theory indicates, but that isn’t what you perceive. Some very significant additional processing takes place.
In the diagram above, (a) is retinal vision, simply rods and cones responding to light.
All of the photoreceptive cells, both rods and cones, attach to ganglion cells (b) for additional processing. In the diagram above, the black lines show the way rods and cones attach to the brightness opponent channel; the red lines show the connections to the red-green axis; and the blue lines show the connections to the blue-yellow access.
The Nerve Fiber processing shows the neural signals sent to the visual cortex as brightness, red and green in opposition and yellow and blue in opposition. Note that as you move from any extremity towards the origin, perception tends towards grey and, proceeding past the origin, you begin to perceive the opposite. So, if you move from, say, red along the axis towards green, you never see reddish green or greenish red. Instead, you move from red to grey and then as you continue, you go from grey towards green.
Imagine white light — that is to say, light of all wavelengths of approximately equal energy — striking the retina. L, M and S are all equally stimulated so there is an equilibrium among the opponents. As a result, we perceive that there is no color and the brightness channel determines the level of brightness we perceive. Let’s say for now that the amount of stimulation causes us to perceive white.
Now, let’s remove the short wavelengths from the white light — anything that would stimulate (S). We still have the red and green opponents equally stimulated so those cones are still contributing to the brightness of the perception, but with regard to color they are in stasis, so there is no color contribution coming from them. But on the other axis, the blue–yellow axis with the short wavelengths removed, blue and yellow are no longer in equilibrium. Blue is gone, so yellow is signaled by those ganglia and you perceive yellow.
Therefore, it isn’t so much that mixing red and green make yellow as it is white light minus blue is perceived as yellow.
And that is just the beginning of the story of how color is synthesized from light into the visual component of perception.
Beyond that, there is more to vision than perception. One reason someone may want to know about perception is to know how to reproduce the experience of seeing objects and perhaps interacting with them.
The question becomes, what do you have to do to present a believable simulation of the real world to an observer? What kind of phenomena must be presented to an observer such that they will perceive it as real? Limiting ourselves only to vision for the moment, if you could completely capture the entire effect of the transport of light in an environment and present that to an observer, that would be the visual equivalent to actually “seeing” the environment.
Of course, the complexity of that task is enormous and probably impossible — but, again, it is probably not necessary. The human visual system doesn’t really process all of the complexity of the world around us, so to present a compelling simulation, we only need to reproduce a subset of what exists in the real world.
That means that there is some threshold where presenting enough data in the right way is enough to make a synthetic object or environment look absolutely real. The question is, where is that threshold? The more we know about perception, the better able we will be to answer that question.
Companies like Light Field Lab in Silicon Valley are trying to find that threshold. With their holographic display technology, Light Field Lab is searching for that sweet spot where light is presented to the human visual system to create a visual experience of synthetic objects and environments that approaches reality and is, in fact, indistinguishable from it.
Footnotes:
[i] A stop, f-stop or t-stop is a way of characterizing the amount of light in an imaging system. An f-stop in particular, is the ratio of the nominal focal length of the imaging system to the clear aperture or entrance pupil. It is the reciprocal of the relative aperture. Each stop changes by a factor of two the amount of light falling on the imaging target.
[ii] 1867, Handbook of Physiological Optics
[iii] J. C. Maxwell, Phil. Trans. R. Soc. 150,57 (1860).
[iv] Hering E. (1878) Principles of a new theory of the color sense
June 16, 2020