Article 25

How to Read a TV Calibration Report: Delta E, Grayscale, Gamma, CIE Charts, and HDR Measurements

Calibration reports are the same picture questions in numbers: grayscale, gamma, color error, CIE charts, HDR EOTF tracking, peak brightness, and what the remaining errors actually mean.

Reading a Measurement Report

A calibration report is what measured calibration leaves behind.

It may come from your own software after measuring your TV. It may appear in a review from a testing site. It may be handed to you by a professional calibrator after a home visit. The format varies, but the basic structure is the same: charts and tables comparing what the display produced against what the standard asked for.

At first, these reports look intimidating.

CIE diagrams. Grayscale tracking. Gamma plots. EOTF curves. ColorChecker charts. RGB balance lines. Delta E values. HDR dE ITP. Pre-calibration and post-calibration tables.

But the report is not a different subject from the calibration arc. It is the same subject in numbers.

Black level becomes low-end luminance behavior.

White balance becomes RGB tracking.

Gamma becomes a measured curve.

Color decoding becomes measured color error.

HDR setup becomes PQ or HLG tracking, tone mapping, peak brightness, and color volume.

Once you understand the few main charts, the whole report becomes readable. You may not be able to perform a full professional calibration just from reading one, but you can understand what the report says, what the TV is doing well, and where the remaining errors live.

The universal idea: error from target

A calibration report is built around one question:

How far is the display from the target?

The target is the standard. For SDR, that might mean D65 white, Rec. 709 color, BT.1886 or gamma 2.4, and a chosen SDR luminance target. For HDR, it may mean D65, the Rec. 2020 container, PQ or HLG EOTF behavior, HDR metadata handling, color volume, and tone mapping.

The display produces a measured result.

The software compares the measured result to the target.

The difference is the error.

Every chart in the report is just a different view of that same comparison.

Sometimes the error is in white balance.

Sometimes it is in gamma.

Sometimes it is in color saturation.

Sometimes it is in hue.

Sometimes it is in luminance.

Sometimes it is in HDR tone mapping.

The report's job is to show where the display is close, where it is off, and whether the error is large enough to matter.

Delta E

The most common error number is Delta E, written Delta E or dE.

Delta E is a color-difference metric. It compares two colors: the target color and the measured color. A smaller number means the display is closer to the target. A larger number means the error is more visible.

A dE of zero means no measured difference.

A dE below 1 is generally excellent. Most people will not see the difference even in careful comparison.

A dE between 1 and 3 is still good. The error may be measurable and may be visible to trained observers in direct comparison, but it is unlikely to matter in normal viewing.

A dE between 3 and 5 is potentially visible. Enthusiasts may notice. Ordinary viewers may not, depending on the color, scene, and context.

A dE above 5 is usually visibly off. The picture may still be watchable, but something is no longer close to reference.

A dE above 10 is plainly wrong.

These are rules of thumb, not laws of nature. Visibility depends on the color, brightness, surrounding image, viewer sensitivity, and the specific dE formula used. But the scale is useful because it gives a practical way to read reports.

When a review says a TV has average grayscale dE of 1.2, that is excellent.

When it says average color dE is 2.5, that is good.

When it says pre-calibration grayscale dE is 7.0, the white balance is visibly off.

When it says post-calibration dE is under 1, the calibration has landed extremely well.

The average matters.

The individual points matter more.

Average versus maximum error

Reports often show both average dE and individual dE values.

The average is useful because it summarizes the overall accuracy. It lets you compare two modes or two displays quickly. A lower average usually means the display is closer to the target overall.

But averages can hide problems.

A TV might have an average color dE of 2.0 but one red measurement at dE 6.0. That means most colors are fine, but red may still be visibly wrong.

A grayscale average of 1.5 may look excellent, but if the 10% and 20% patches are badly tinted while the rest is clean, dark scenes may still show a problem.

A report is best read in two passes.

First, look at the average.

Then, look for outliers.

The average tells you the general health of the calibration.

The outliers tell you what you might actually see.

Different Delta E formulas

Not all dE values are calculated the same way.

Common formulas include dE76, dE94, dE2000, and dE ITP.

For modern SDR calibration reports, dE2000 is one of the common choices. It is designed to better reflect human visual sensitivity than older, simpler formulas.

For HDR and wide-color-gamut work, dE ITP may appear. It was designed for television images and HDR/WCG conditions, using the ICtCp-based approach defined for modern video systems.

Do not assume every dE number from every report is directly interchangeable.

A dE2000 value in an SDR report and a dE ITP value in an HDR report are both error metrics, but they are not the same scale in everyday reporting practice. Read the report's legend. Check which formula is being used. Compare like with like.

This matters especially when reading HDR reports. Some sites use dE ITP with different thresholds from their SDR dE2000 thresholds. A number that would look alarming in one metric may be interpreted differently in another.

The simple rule:

Use dE to understand error.

Use the report's own threshold and formula to judge severity.

Do not mix dE2000 and dE ITP as if they are identical.

The CIE chromaticity chart

The most recognizable chart is the CIE chromaticity diagram.

It is the familiar horseshoe-shaped map of visible color, usually with a triangle inside it. That triangle represents the color gamut being measured: Rec. 709 for SDR, P3 inside a Rec. 2020 container for many HDR reports, or Rec. 2020 itself when checking coverage.

The report usually shows two sets of points.

One set is the target: where red, green, blue, cyan, magenta, yellow, and D65 white are supposed to land.

The other set is the display's measurement: where the TV actually put those colors.

A perfect display would place the measured points directly on the targets.

Real displays show offsets.

If measured red lands toward orange, red is shifted in hue.

If measured green lands short of the target, green may be undersaturated.

If measured blue lands outside or inside the target, blue saturation or hue may be off.

If white lands away from D65, the white point is wrong.

The CIE chart is useful because it gives a visual map of the display's color behavior.

But do not read it too literally.

The CIE 1931 xy diagram is not perceptually uniform. A small-looking distance in one part of the chart may be more visible than a larger-looking distance somewhere else. That is why the dE numbers matter. The chart shows direction. Delta E shows perceptual size.

Use the CIE chart to answer:

Which color is off?

Which direction is it off?

Is the display undersaturating or oversaturating?

Is white near D65?

Use dE to answer:

How much does the error matter?

Gamut coverage versus color accuracy

Do not confuse gamut coverage with color accuracy.

Gamut coverage tells you how much of a color space the display can physically reach.

Color accuracy tells you whether the display puts colors in the correct places.

A display can cover a wide gamut and still be inaccurate.

A display can be very accurate within Rec. 709 but not cover all of P3 or Rec. 2020.

These are different questions.

A TV that covers 99% of Rec. 709 and places its colors correctly will be excellent for SDR.

A TV that covers 95% of P3 but maps colors poorly may look less accurate in HDR despite having a wide panel gamut.

A TV that accepts Rec. 2020 signals does not necessarily cover Rec. 2020 fully. Most consumer displays do not. They map the Rec. 2020 container into the panel's real color volume.

A calibration report should help you separate capability from accuracy.

Capability asks, "Can the display reach this color?"

Accuracy asks, "When asked for this color, did the display produce it correctly?"

Calibration can improve accuracy.

It cannot create colors the hardware cannot physically produce.

The grayscale tracking chart

The grayscale chart shows whether the TV keeps white neutral from dark gray to peak white.

A neutral gray is made from red, green, and blue in the right balance. If all three channels are balanced, gray looks gray. If one channel is too high or low, gray becomes tinted.

A grayscale tracking chart usually shows red, green, and blue lines across different brightness levels.

The horizontal axis is brightness: 5%, 10%, 20%, 30%, and so on up to 100% white.

The vertical axis shows the relative balance of red, green, and blue.

On a well-calibrated display, the red, green, and blue lines stay close together across the whole range. They usually hover near the target line, often shown around 100%.

When the lines separate, the grayscale is tinted.

If blue is high and red is low, the image is too cool at that brightness.

If red is high and blue is low, it is too warm.

If green is high, the image has a green push.

If green is low relative to red and blue, the image may look magenta.

This chart matters because white balance affects everything. A display with poor grayscale tracking will tint every color built on that grayscale foundation.

A blue high end makes bright whites cool.

A red low end makes shadows warm.

A green midrange makes faces and neutral surfaces look sickly.

The dE bars or values beside the grayscale chart tell you how visible the imbalance is at each level.

The RGB lines tell you why it is happening.

Reading grayscale errors

Look for patterns.

If the whole grayscale is too blue, the color-temperature preset is too cool or the white balance needs adjustment.

If only the highlights are blue, the high-end gain controls are off.

If only the shadows are red, the low-end bias or offset controls may be off.

If the middle is wrong but the ends are correct, a 2-point white balance adjustment may not be enough; the display may need multipoint correction.

If the lines wander up and down across the range, the display's grayscale tracking is inconsistent.

This is where 2-point, 10-point, 11-point, or 20-point white balance controls do their work.

A 2-point calibration adjusts the low and high ends.

A multipoint calibration adjusts several brightness steps across the range.

The grayscale chart is the feedback loop. Adjust, measure, check the lines, and repeat.

But do not overreact to tiny wiggles. No consumer display is mathematically perfect at every point. The question is whether the errors are visible and whether correcting them creates new problems.

The goal is not a beautiful graph at all costs.

The goal is a better picture.

Gamma tracking in SDR

The gamma chart shows how the display moves from black to white in SDR.

It is the measured version of the gamma article from the main arc.

A target line shows the intended curve: often 2.2, 2.4, or BT.1886. The measured line shows what the display actually produced.

If the measured gamma is above the target, that part of the image is too dark.

If it is below the target, that part is too bright.

If the line follows the target closely, the display is tracking well.

A gamma chart tells you where tonal problems occur.

If the shadows are too high, dark scenes may look lifted and gray.

If the shadows are too low, near-black detail may crush.

If the midtones are too high, faces and interiors may look heavy.

If the midtones are too low, the picture may look washed out.

If the highlights deviate, bright areas may feel compressed or unnaturally open.

Gamma is often measured from the same grayscale patch sequence used for white balance. That is why grayscale and gamma charts often appear together. One chart shows color balance. The other shows brightness response.

Both matter.

BT.1886

BT.1886 is not just a label for "gamma 2.4," though it is closely related.

It is the reference electro-optical transfer function for flat-panel displays used in HDTV production. It uses an exponent close to 2.4 and includes the display's measured black and white levels in the equation.

In reports, a BT.1886 target line may not look identical to a pure 2.4 power curve, especially on displays with a raised black level.

This is not an error.

BT.1886 is designed to account for real display endpoints.

On a display with very deep black, it can behave very close to gamma 2.4.

On a display with higher black level, it may adjust the low end differently.

When reading a report, make sure you know what gamma target was selected. A TV measured against 2.2 will not look the same as one measured against 2.4 or BT.1886. The report is only meaningful relative to the chosen target.

For dim-room SDR video, BT.1886 or 2.4 is common.

For brighter rooms, 2.2 may be a more practical target.

A report does not know your room unless the calibrator chose the target for it.

HDR EOTF tracking

HDR reports replace SDR gamma with EOTF tracking.

For PQ HDR, the target is the PQ curve. The report shows whether the display follows the luminance values the HDR signal asks for.

At low and mid brightness, a good HDR display may track PQ closely.

As the signal asks for brightness above what the TV can produce, the display must tone-map. The measured line then rolls away from the target. That rolloff is not automatically a failure. It is the TV fitting an HDR signal into real hardware limits.

The shape of the rolloff matters.

A graceful rolloff preserves highlight detail and avoids hard clipping.

An abrupt rolloff can make highlights flatten.

A display that tracks too bright may make HDR look punchy but less accurate.

A display that tracks too dim may make HDR look conservative or dull.

HDR EOTF charts tell you how the TV balances accuracy, brightness, and highlight preservation.

For HLG, the report may use a different target because HLG is not the same absolute PQ system. Do not read an HLG chart as if it were a PQ chart.

The key distinction:

SDR gamma charts show a relative curve.

HDR EOTF charts show how the display handles a different brightness system, often including tone mapping.

Color point measurements

A report usually measures more than white.

It also measures colors.

The basic set is six points:

Red.

Green.

Blue.

Cyan.

Magenta.

Yellow.

These are the primaries and secondaries.

A more complete report measures saturation sweeps. Instead of measuring only fully saturated red, it may measure red at 25%, 50%, 75%, and 100% saturation. The same may be done for the other colors.

This matters because a display can hit the 100% red target but miss the 50% red target. Real content contains many partially saturated colors, not just gamut-corner extremes.

Some reports also use ColorChecker-style patches. These include natural colors such as skin tones, foliage, sky, neutral grays, and common memory colors. These are useful because they are closer to real content than only measuring the corners of the gamut.

When reading color measurements, scan for three things:

Average dE.

Maximum dE.

Where the outliers are.

If the average is low and the maximum is low, color accuracy is strong.

If the average is low but one color is bad, that color may still be visible in real content.

If skin-tone patches are high, faces may look wrong even if saturated primary colors measure well.

If cyan and green are off, skies and foliage may look strange.

If red is oversaturated, skin, lips, clothing, fire, and signage may look too hot.

The report tells you not just whether color is wrong, but which colors are wrong.

CMS and color point errors

The Color Management System exists to correct color point errors.

But the report should guide whether CMS adjustment is worth doing.

If all colors are slightly off in the same direction, the problem may be white balance, color temperature, color space, or gamut mapping rather than CMS.

If only one primary or secondary is off, CMS may help.

If saturation sweeps bend in strange ways, a simple CMS correction at 100% saturation may not fix the whole range.

If the panel cannot reach the target color, CMS cannot create that missing capability.

If CMS adjustment improves one patch but worsens several others, the correction may not be worth it.

This is why measured color calibration is iterative. You do not adjust red because red "looks wrong." You adjust red, measure red, measure neighboring colors, check sweeps, check luminance, and make sure the cure is not worse than the disease.

A good report helps you decide whether the errors are small enough to leave alone.

Sometimes the best calibration move is restraint.

HDR color reports

HDR color reports add complications.

HDR color is not only about chromaticity. It is also about brightness. A display may hit a color at low luminance but fail to maintain saturation at high luminance. That is color volume.

An SDR CIE chart is a flat map.

HDR color performance is three-dimensional.

A display can have excellent P3 coverage at modest brightness and still lose saturation in very bright colors. Another display may have less perfect gamut coverage but better high-brightness color volume.

HDR reports may include:

HDR white balance dE ITP.

HDR color dE ITP.

PQ EOTF tracking.

Peak brightness windows.

Tone-mapping curves.

Color volume graphs.

P3 coverage.

Rec. 2020 coverage.

Saturation sweeps at different brightness levels.

These are more difficult to reduce to one number.

An HDR TV is not simply "accurate" or "inaccurate." It has limits: peak brightness, black level, local dimming, OLED ABL behavior, color volume, tone mapping, and metadata handling.

A good HDR report shows both accuracy and capability.

Delta E ITP

dE ITP appears in many HDR reports.

It is designed for HDR and wide-color-gamut television conditions. It is derived from ICtCp-based color representation and is intended to assess potential visibility of color differences in television images and signals.

The important practical warning is this:

Do not compare dE ITP numbers directly to SDR dE2000 numbers as if they are the same thing.

They are both error metrics, but they are used differently and often judged with different thresholds in display reviews.

If a report says HDR color dE ITP is good under a certain value, use that report's scale. Do not import the SDR dE2000 "under 3 is good" rule without checking the methodology.

For HDR, methodology matters a lot.

Which meter was used?

Was the colorimeter profiled to a spectroradiometer?

Which HDR format was measured?

Which tone-mapping mode was active?

Was local dimming on?

Was dynamic tone mapping on or off?

What peak brightness did the TV target?

What patch sizes were used?

All of those can change the result.

Peak brightness charts

HDR reports often include peak brightness measurements at different window sizes.

A 2% window measures a small highlight.

A 10% window measures a common HDR highlight size.

A 25%, 50%, or 100% window measures larger bright areas.

This matters because displays behave differently depending on how much of the screen is bright.

OLEDs may be very bright on small highlights and much dimmer on full-screen white because of brightness limiting.

Mini-LED LCDs may be extremely bright in windows but show local-dimming behavior, blooming, or different sustained performance.

Projectors may have much lower peak brightness but large-image advantages.

A single peak-nits number is not enough.

A report with window sizes shows how the display handles real HDR scenes: small sparks, lamps, sun glints, bright clouds, snowy fields, hockey rinks, white menus, and full-screen bright images.

Read the 10% window for a rough sense of highlight power.

Read the 100% window for full-screen brightness.

Read real-scene measurements when available because they are often more representative than test windows alone.

Color volume

Color volume is the combination of color and brightness.

A 2D gamut chart tells you whether a display can reach a color at some level.

Color volume asks whether it can reach that color at different brightness levels.

HDR makes color volume important because bright saturated colors are much harder to produce than dim saturated colors.

A display may show deep red accurately at modest brightness but fail to produce a very bright saturated red. It may preserve blue well but lose saturation in bright green. It may cover P3 well on paper but not sustain that color at HDR brightness.

Color volume charts can be shown as 3D shapes or as slices at different luminance levels. They are harder to read than a basic CIE chart, but the idea is simple:

Bigger usable volume means more combinations of brightness and saturation.

Better accuracy within that volume means those combinations land closer to target.

HDR picture quality depends on both.

A TV with high brightness but weak color volume can look bright but pale.

A TV with strong color saturation but low brightness can look rich but not very HDR-like.

The best HDR displays do both well.

Pre-calibration versus post-calibration

Many reports show pre-calibration and post-calibration results.

Pre-calibration means the display as it came from the factory in the chosen picture mode, usually the most accurate one.

Post-calibration means after adjustments were made.

Both matter.

Pre-calibration tells you how good the TV is out of the box. This matters for most users because most people will not hire a calibrator or buy measurement equipment.

Post-calibration tells you how good the TV can become with work.

A good TV has strong pre-calibration accuracy and can improve further.

A less accurate TV may improve dramatically after calibration, but still require work to get there.

A TV that remains inaccurate after calibration may have hardware, software, or control limitations.

When reading reviews, pay attention to the mode used for pre-calibration. A TV measured in Vivid mode will look terrible by design. A useful pre-calibration report should use the most accurate default mode: Filmmaker, Movie, Cinema, Custom, Professional, or equivalent.

Also remember panel variation.

A review measures one sample. Your unit may be slightly better or worse. That is why copying another unit's white-balance or CMS settings is risky.

A report teaches behavior.

It is not a universal settings recipe.

Recognizing a good SDR report

A strong SDR report usually looks like this:

White balance dE is low across the grayscale.

RGB tracking lines stay close together.

Color temperature is near D65.

Gamma follows the chosen target.

CIE points land near Rec. 709 targets.

Color dE is low for primaries and secondaries.

ColorChecker or saturation sweep errors are low.

No major outliers appear.

Post-calibration changes improve the report without introducing artifacts.

Average dE values under 3 are good.

Average values under 2 are very good.

Average values under 1 are excellent.

But again, do not read only the averages. Look at the graphs.

A TV with average dE 1.5 and one ugly near-black error may still show a visible shadow tint. A TV with average dE 2.2 but no major outliers may look more consistent.

Reports are maps, not trophies.

Recognizing a good HDR report

A strong HDR report usually looks like this:

HDR white balance is close to D65.

HDR color errors are low using the report's HDR metric.

PQ tracking follows the target well until the display reaches its tone-mapping region.

Tone mapping rolls off highlights smoothly rather than clipping abruptly.

Peak brightness is appropriate for the display class.

Black level is controlled.

Local dimming or OLED pixel control behaves well.

P3 coverage is strong.

Rec. 2020 coverage is reported honestly, not assumed.

Color volume is good for the display type.

The report specifies whether dynamic tone mapping, local dimming, Dolby Vision, HDR10, or HDR10+ behavior was measured.

HDR reports require more caution than SDR reports because HDR depends heavily on the TV's processing choices. Two tone-mapping modes on the same TV can produce very different charts. One may be more accurate. Another may look brighter in real content. A report should make clear which mode was used.

If it does not, be careful.

What a report cannot tell you

A calibration report is powerful, but it is not everything.

It cannot tell you whether you personally prefer a brighter daytime mode.

It cannot make a poor source look good.

It cannot overcome weak HDR hardware.

It cannot remove screen reflections.

It cannot turn a projector into an OLED.

It cannot make a display cover a gamut it physically cannot reach.

It cannot guarantee your unit matches the reviewed unit.

It cannot tell the whole story of local dimming artifacts, blooming, motion handling, processing stability, or viewing-angle behavior unless those were specifically tested.

It also cannot replace watching real content.

A beautiful report is meaningful, but the final verification is still the picture. Good measurements should correlate with a clean, stable, accurate image. If the report is excellent and real content still looks wrong, investigate the source chain, room, mode, processing, or measurement setup.

Numbers are not the goal.

They are evidence.

Common reading mistakes

The first mistake is treating average dE as the whole story.

Always check outliers.

The second mistake is comparing dE values across different formulas.

Do not compare SDR dE2000 and HDR dE ITP directly.

The third mistake is ignoring the target.

A gamma chart only makes sense if you know whether the target was 2.2, 2.4, or BT.1886.

The fourth mistake is confusing gamut coverage with color accuracy.

A wide-gamut display can still be inaccurate.

The fifth mistake is assuming post-calibration values from one sample apply to every unit.

They do not.

The sixth mistake is ignoring picture mode.

A report in Filmmaker Mode tells you one thing. A report in Vivid tells you another. A report in Game Mode tells you something else.

The seventh mistake is ignoring HDR tone-mapping settings.

Dynamic Tone Mapping on and off can produce different EOTF behavior. Dolby Vision Dark and Dolby Vision Bright can behave differently. Game HDR and movie HDR can behave differently.

The eighth mistake is assuming every visible problem is calibration.

Compression, poor mastering, room light, reflections, panel limits, and source settings all matter.

How to read a report in order

Use this order.

First, identify the mode and format.

Was it SDR, HDR10, Dolby Vision, or HLG?

Which picture mode?

Which gamma or EOTF target?

Which color space?

Which tone-mapping setting?

Second, check averages.

White balance dE.

Color dE.

Gamma error.

Color temperature.

HDR dE ITP if applicable.

Third, check outliers.

Which grayscale levels are worst?

Which colors are worst?

Are errors clustered or scattered?

Fourth, check the charts.

Are RGB lines tight?

Does gamma or EOTF follow the target?

Do CIE points land near reference targets?

Do saturation sweeps behave consistently?

Fifth, check capability.

Gamut coverage.

Peak brightness.

Full-screen brightness.

Color volume.

Black level.

Sixth, compare pre and post.

Did calibration improve the right things?

Did it leave any major errors?

Did it introduce tradeoffs?

Finally, translate the numbers back into viewing.

Will these errors show up in faces?

Shadows?

Whites?

Sports?

HDR highlights?

Animated color?

Near-black scenes?

That is the skill: moving from chart to picture.

Where this leaves us

A measurement report is not a wall of mysterious graphs.

It is a structured answer to one question:

How close is this display to the standard?

Delta E tells you how large the visible error is.

The CIE chart tells you where colors land.

The grayscale chart tells you whether white stays neutral from dark to bright.

The gamma chart tells you how SDR brightness rises from black to white.

The HDR EOTF chart tells you how the TV follows PQ or HLG and where tone mapping begins.

Color point tables tell you which colors are accurate and which are not.

HDR reports add brightness, tone mapping, dE ITP, and color volume.

Once you know what each piece is saying, the report becomes readable. You can tell whether a TV is accurate out of the box, whether calibration helped, whether a remaining error matters, and whether a limitation belongs to calibration or hardware.

That is the point of measurement.

Not to replace the picture.

To explain it.

The next measurement-sidebar piece covers the workflow behind the report: how the measurements are taken, how adjustments are made, why calibration happens in a specific order, and how the final graphs are produced.

Next: Measurement Workflow Continue the measurement sidebar with how readings are taken, how adjustments are made, why calibration happens in order, and how final graphs are produced.