How Human Vision Shapes TV Picture Quality

A television is a machine for fooling the human visual system.

That's not a metaphor, or at least not much of one. The entire architecture of video is built around the way human eyes and brains construct the experience of seeing. Color spaces, transfer functions, white points, compression systems, brightness standards, display primaries - none of them make sense in isolation. They make sense because they are designed for a particular receiver: us.

If we had four ordinary channels of color vision instead of three, video would look very different. If our eyes responded linearly to brightness, gamma and HDR transfer functions would be designed differently. If our brains did not constantly adapt to the color of illumination, "white" would not need to be pinned down so carefully. Television exists in the shape it does because human vision exists in the shape it does.

To understand why video standards are what they are, you have to start one step back from the standards themselves and look at the visual system they were built to serve. The strange parts of video - gamma curves, color spaces, SDR and HDR, color temperature, white point, tone mapping - all become much less arbitrary once you know what they are working with on the receiving end.

So this piece is about the receiving end: light, the eye, and the peculiar way the brain turns one into the other.

None of what follows is necessary to operate a television. People have been watching TV for decades without thinking about any of it. But once you've seen it, the rest of calibration lands differently. Every standard starts to feel less like an arbitrary technical rule and more like a practical response to the human visual system.

Light, before the eye gets involved

Light, in physical terms, is electromagnetic radiation. The portion of it that human eyes can detect - the visible spectrum - is usually described as running from roughly 380 nanometers at the violet end to roughly 700 nanometers at the red end. Those limits are not hard walls. Human sensitivity falls off gradually, and individual observers vary. But as a working range, 380 to 700 nm is close enough.

Outside that band, there is still electromagnetic radiation. Ultraviolet sits beyond the short-wavelength violet end. Infrared sits beyond the long-wavelength red end. Farther out are microwaves, radio waves, X-rays, and gamma rays. The world is full of radiation we do not experience as light because our eyes are not built to detect it. Visible light is a small slice of a much larger continuum.

Within that visible band, any source of light has a spectral power distribution - a description of how much power it contains at each wavelength. Sunlight has a broad distribution. A low-pressure sodium streetlamp is concentrated in a narrow yellow region. A tungsten incandescent bulb is broad but weighted heavily toward longer, warmer wavelengths. A white LED may be broad enough to look white, but its spectrum is usually uneven, often with a strong blue component and phosphor-generated energy spread across the rest of the visible range.

These spectra are physical facts. A spectrophotometer can measure them and draw a graph.

But so far there is no color.

Color, as we experience it, is not simply a property of light by itself. Light has wavelengths and power. Color is the sensation produced when that light enters an eye and triggers a chain of biological and neural events. Two beams of light with very different spectral distributions can produce the same color sensation in a human observer. Two observers looking at the same light can have slightly different color experiences.

This is not a philosophical side road. It is the central fact that video depends on. A video signal does not reproduce the full spectrum of the original light. It carries a code meant to produce, in a human visual system, a similar enough sensation.

Three kinds of cones

The reason this works is that normal human color vision begins with three kinds of cone cells in the retina.

They are called L, M, and S cones: long-, medium-, and short-wavelength cones. These names are better than the common shorthand of red, green, and blue cones, because the shorthand can be misleading. L cones are most sensitive in the longer-wavelength part of the visible range, but their peak response is not at pure red. M cones overlap heavily with L cones. S cones respond to shorter wavelengths in the violet-blue region. The three response curves overlap substantially.

That overlap matters. There is no ordinary visible wavelength that excites only one type of cone while leaving the others completely silent. Each cone type responds across a range of wavelengths, and each cone effectively reduces the incoming spectrum to a response level: how strongly that cone type was stimulated.

So when light enters your eye, the continuous spectrum of that light is collapsed into three broad cone responses. The brain then builds color from the relationships among those responses.

This three-channel architecture is one of the most consequential facts in the history of video engineering. It is why cameras capture color through three channels. It is why displays use red, green, and blue primaries. It is why video color standards are built around three primary colors. We use three not because the world has three colors, but because typical human color vision starts with three overlapping cone classes.

A display is not reproducing the original spectrum of the scene. It is trying to reproduce the cone responses that matter for human vision. That is a very different thing.

This is why two physically different spectra can look identical. A narrow yellow light and a mixture of red and green light can produce the same, or nearly the same, cone responses in a viewer. When two different spectra match in perceived color, they are called metamers. Television depends on metamerism. Your screen does not need to produce every wavelength present in the original scene. It only needs to produce a mixture of its own primaries that your visual system accepts as the same color.

That is the trick.

It is also why color vision deficiencies take the forms they do. Many common forms involve the L or M cone system: a missing cone class, a shifted cone sensitivity, or a reduced distinction between cone responses. When that happens, color differences that are obvious to a typical trichromatic observer can collapse together or become much harder to distinguish. Red and green are not "invisible" in a simple sense; the problem is that the cone-response information needed to separate them has been changed.

Brightness is not what you think it is

Cones do most of the work in good light. In dim light, another class of photoreceptors becomes more important: rods. Rods are much more sensitive in low light, but they do not support color vision the way cones do. This is why very dim scenes tend to look nearly colorless. The visual system is still receiving light, but the color-carrying cone system is no longer doing the main work.

The human retina contains far more rods than cones, and they are distributed differently. Cones are concentrated most densely near the fovea, the central region used for sharp, detailed vision. Rods are more dominant away from the very center. This is why faint stars can be easier to see when you look slightly to the side of them rather than directly at them.

For television, rods usually play a smaller role than cones because a TV emits enough light to keep cone vision engaged in normal viewing conditions. But the rod system still matters indirectly because the eye is adaptive. Your visual system changes its sensitivity depending on the light level around you.

A TV that looks painfully bright when you walk into a dark room may look normal after your eyes adapt. The same TV that looks fine during the day may feel harsh at night. The picture has not changed; your visual system has. This is one reason mastering environments are controlled. Colorists do not judge images in random rooms under random light. They work in controlled viewing conditions so their eyes are adapted to a known environment.

Even within a given adaptation state, brightness perception is not linear. Double the physical light and the perceived brightness does not simply double. Human vision is much more sensitive to relative differences than to absolute ones, and it is especially sensitive to small differences in darker regions. A small step near black can be obvious. The same physical step near white may be barely noticeable.

This has enormous consequences for video.

If you stored brightness in a purely linear digital signal, you would spend too many code values describing differences the eye barely notices in bright areas, while leaving too few values for the shadow differences the eye notices easily. Shadows would band. Subtle dark gradients would fall apart. Highlights would receive precision the viewer could not use.

So video does not normally encode brightness linearly. It uses transfer functions: curves that relate scene light, signal values, and display light. In SDR, this family of behavior is usually discussed under the word gamma, though the actual standards are more specific than that word alone. In HDR, the wider brightness range requires different transfer functions. PQ, the Perceptual Quantizer, is designed to allocate code values according to perceptual visibility over a very large luminance range. HLG, Hybrid Log-Gamma, takes a different approach meant partly for broadcast compatibility.

The details matter, and later pieces can give them their own treatment. But the reason these curves exist is simple: the eye does not experience brightness linearly, and video has to spend its limited precision where human vision will notice.

The eye is not a camera

The last piece of the puzzle, and maybe the most important, is that the brain is doing a great deal of work before anything reaches conscious experience.

The image you experience is not a direct copy of the cone and rod responses on your retina. It is a processed, stabilized, interpreted reconstruction. Your visual system adapts to the light in the room, compares colors against their surroundings, emphasizes edges, discounts some changes, exaggerates others, fills in missing information, and builds a coherent world out of imperfect input.

The most important example for video is chromatic adaptation: the visual system's tendency to discount the color of the illumination and preserve the apparent color of objects.

A sheet of white paper looks white under daylight, under a tungsten bulb, under fluorescent light, and under a candle. Physically, the light reaching your eye from the paper is different in each case. Under a warm lamp, the reflected light is much yellower than it is under daylight. But your brain does not simply report the raw cone responses. It estimates the illumination and partially subtracts it from the scene. It tries to decide what color the object is, not merely what spectrum is arriving at the eye.

This is useful in the real world, but it creates problems for video.

It is the reason "white" needs a defined reference. There is no single spectrum that is objectively white in every context. There are reference whites: agreed points that define what white means within a system. For video, the usual reference is D65, a CIE standard illuminant intended to represent average daylight. Mastering monitors are calibrated to that white point. Your TV is supposed to aim there too.

When it does not, the whole picture shifts. If the TV's white is too blue, every color in the image inherits that bias. If the white is too green or too red, the image bends in that direction. Your brain may partially adapt, but adaptation is not a substitute for accuracy. The colorist graded the image against a known white. If your display uses a different white, you are asking the visual system to correct a problem that should not have been introduced.

This is also why the "Warm" color temperature mode on many TVs can look strange at first. People are often used to default TV modes that are too cool and blue. A more accurate D65 white can initially look yellow by comparison. But after a few minutes of adaptation, it often stops looking yellow and starts looking neutral. The old blue-white setting then looks obviously cold.

There is more beyond adaptation. The brain enhances edges. It uses contrast to judge brightness. It interprets motion. It builds depth from flat images. It suppresses the blur and interruption caused by eye movements and blinks. Most of what feels like seeing is inference.

Cinema depends on this. A movie is not reality. It is a sequence of flat images, usually shown at 24 frames per second, arranged so the visual system builds motion, space, emotion, and continuity out of them. Video standards do not fully model all of that higher-level perception, but they exist in its shadow. They are practical engineering compromises designed around the receiver that will ultimately judge the picture.

Why this matters for your TV

Pull back.

We started with light as electromagnetic radiation. We moved through the three-cone system that makes color reproduction possible. We looked at the eye's nonlinear and adaptive response to brightness. We ended with the brain's habit of constructing a stable visual world from incomplete and context-dependent information.

Out of that, video inherits four basic facts.

First, color can be reproduced with three primaries because typical human color vision is trichromatic. Your TV has red, green, and blue subpixels not because the world is made of red, green, and blue, but because those primaries can be mixed to produce a large range of cone-response combinations. That reachable range is the display's gamut.

Second, brightness must be handled nonlinearly because human brightness perception is nonlinear. SDR gamma behavior and HDR transfer functions are not arbitrary complications. They are ways of mapping physical light, signal values, and display output so the available precision lands where vision needs it most.

Third, white needs a reference because vision adapts. D65 is not "the one true white" in nature. It is the agreed white point for the system. Without that agreement, every color becomes relative to whatever white the display happens to produce.

Fourth, viewing context matters because perception is contextual. A calibrated TV in a bright, colorful, uncontrolled room is not being seen under the same conditions as a mastering monitor. Ambient light, wall color, screen reflections, viewing distance, and adaptation all change the experience. Calibration starts with the display, but it does not end there.

Each of these threads will get its own treatment later. Color primaries lead to gamut. Brightness perception leads to gamma, PQ, and HLG. Adaptation leads to white point and color temperature. Context leads to the room itself.

But they all begin in the same place: the human visual system, with its three cone classes, its rod-supported low-light vision, its nonlinear brightness response, its automatic white balancing, and its relentless effort to make sense of incomplete information.

Television is not reproducing reality. It is producing, with great care, a stimulus tuned to the way human beings see.

Once you understand the eye, the standards stop looking arbitrary. They start looking inevitable.

Next: TV Color Space Explained Move from human vision into the color maps that define Rec. 709, P3, Rec. 2020, and your TV's gamut.