What JPEG throws away
A JPEG is often a tenth the size of the raw image and you can't see the difference. Where did ninety percent of the data go? We follow an 8×8 block through the discrete cosine transform and quantisation to find out: JPEG sorts a picture by spatial frequency, then quietly bins the fine detail your eyes barely register.
Save a photo as a JPEG and it's often a tenth the size of the raw original, sometimes less, and you'd struggle to spot the difference. That should bother you a little. Where did ninety percent of the information go? It can't be free. The answer is that JPEG is very good at finding the parts of an image your eyes don't actually use, and throwing exactly those away. It's not magic compression; it's a careful model of what human vision ignores. Let's follow a single tile of an image through the machine and watch the data disappear.
This is, underneath, the same frequency-domain idea as the Fourier series (re-describing a signal by which waves it's made of), applied to pictures. If that post was about sound and circles, this is the same trick in two dimensions, doing something eminently practical.
Stop storing pixels
JPEG's first move is to stop thinking in pixels. It chops the image into 8×8 blocks and asks a different question of each one: not "what are these 64 brightness values" but "how much of each spatial frequency is in this block". A flat patch of sky is almost all low frequency (it barely changes across the block); a patch with a sharp edge or fine texture has high-frequency content. The tool that re-expresses a block this way is the discrete cosine transform (DCT), a close cousin of the Fourier transform that uses cosines:
with and otherwise.
The output is another 8×8 grid, but now each cell is a coefficient: how much of one particular cosine pattern the block contains. The top-left cell () is the block's average brightness, the "DC" term; move right and down and the patterns get finer, up to a fine checkerboard in the bottom-right corner. Here's a block, its coefficients, and the reconstruction, side by side. Drag the quality down and watch what happens.
The detail clusters where you'll miss it
Here's the property that makes the whole thing work, and you can see it in the coefficient panel: for real images, almost all the magnitude piles up in the top-left, the low frequencies. The fine, high-frequency coefficients are mostly small. That's called energy compaction, and the DCT is famously good at it for natural images. So most of a block is described by a handful of numbers, and the rest is small change.
The DCT sorts detail by how much you'll notice it
This is the quiet brilliance. The transform doesn't delete anything: it's perfectly reversible, the reconstruction at top quality is exact. What it does is reorganise the block so that the important, low-frequency structure and the negligible, high-frequency detail end up in separate, labelled boxes. Once they're sorted, you can keep the big boxes and discard the small ones. And it happens that human vision is far more sensitive to low spatial frequencies (broad shapes and gradients) than to high ones (fine texture), so "the small coefficients" and "the detail you won't miss" are largely the same set. The DCT lines the data up so the throwing-away is easy and cheap to your eye.
The actual throwing-away
Sorting doesn't save space by itself; the discarding happens at the next step, quantisation. Each coefficient is divided by a number from a quantisation table and rounded:
The table uses small divisors for the low frequencies (keep them precise) and large divisors for the high ones (crush them). A high-frequency coefficient divided by a big number and rounded becomes zero, and a block full of zeros compresses to almost nothing in the final lossless step. That's where the ninety percent goes: into runs of zeros that used to be faint detail.
The quant table is a model of your eyes
The standard JPEG quantisation table isn't arbitrary. It was tuned against experiments on human contrast sensitivity. It throws away high-frequency luminance aggressively, and high-frequency colour even more aggressively, because the eye is least sensitive there (it's also why JPEG stores colour at lower resolution than brightness). The "quality" slider just scales this whole table up or down. So a JPEG isn't a generic compressor that happens to work on images; it's a compressor with a model of your visual system baked into a little 8×8 table of numbers.
To get the picture back, you multiply the quantised coefficients by the table again (recovering approximations of the originals, the rounding now permanent) and run the DCT in reverse:
You can push it too far
Drag the quality slider all the way down in the demo and the trick stops hiding. Two things appear. First, the 8×8 grid becomes visible. Because each block is quantised independently, neighbours stop agreeing at their edges and the tiling shows through. Second, sharp edges ring: an edge needs lots of high-frequency coefficients to stay crisp, and once you've binned those away, the inverse DCT reconstructs the edge as a wobble, the same overshoot the Fourier series shows at a discontinuity (the Gibbs phenomenon, in two dimensions). Those two artefacts, blocking and ringing, are the visible signature of JPEG running out of room.
Why heavily-shared JPEGs look crunchy
Every time an image is re-saved as JPEG it gets quantised again, and the damage accumulates. This is "generation loss", the reason a meme that's been screenshotted and re-shared a hundred times looks like it's been through a war. It's also why you should keep a lossless master (PNG, or the raw) of anything you'll edit repeatedly, and only export to JPEG at the end. Quantisation is a one-way door; walk through it once per save.
Some food for thought: JPEG, MP3, and most of the media you consume are built on the same philosophy: find the limits of human perception and store only up to them. MP3 throws away sounds your ear masks; JPEG throws away detail your eye glosses over. There's something a bit unsettling and a bit wonderful about how much of our digital world is shaped precisely around the gaps in our own senses. We're not storing reality; we're storing the part of it we can tell apart.
Recap
JPEG stops storing pixels and starts storing spatial frequencies, using the DCT to sort each 8×8 block so the important low-frequency structure separates cleanly from the negligible high-frequency detail. Quantisation then rounds the high frequencies toward zero, guided by a table tuned to human vision, and the resulting runs of zeros compress away. Done gently it's invisible; pushed hard it shows its hand as blocking and ringing. Ninety percent of the file was detail you were never going to see.
Reading further
- Wallace (1991), The JPEG Still Picture Compression Standard: the original overview from the committee that designed it, and still the clearest end-to-end description. ieeexplore / PDF
- Ahmed, Natarajan & Rao (1974): the paper that introduced the discrete cosine transform itself, the mathematical engine under the whole format. the DCT paper
- Pennebaker & Mitchell, JPEG: Still Image Data Compression Standard: the deep reference book if you want every stage including the entropy coding the post skipped.
- Christopher Olah-style visual treatments and the Unraveling JPEG interactive (Omar Shehata, parametric.press): a gorgeous interactive walk through every stage if you want to poke at a real image. parametric.press/issue-01/unraveling-the-jpeg
Try it in the lab
All effects →Band Structure
physicsNearly-free electron E-k diagram with Brillouin zone gaps.
condensed mattersolid stateTransmission Line Pulse
engineeringTDR — a voltage pulse travels, reflects, and inverts on a mismatched line.
rftdrimpedanceWave Superposition
physicsInterference of two plane waves — beats, standing waves, and nodes.
wavesinterference
More from the blog
Why your shower temperature oscillates
The scalding-then-freezing shower dance isn't you being bad at taps — it's a feedback loop fighting a time delay, and it has a name. We meet the PID controller behind thermostats, cruise control and showers, tune one live, and find out why a little lag turns sensible corrections into oscillation.
Filters from Poles and Zeros
A filter is two polynomials, and the roots of those polynomials are the whole story. Place a few points in the complex plane and you can read the entire frequency response straight off the geometry — no calculus required at the point of use.
Backprop is just the chain rule
Training a neural network sounds mystical, but the engine underneath is one idea from first-year calculus: the chain rule, applied backwards through a computation graph and reusing its work. We trace a forward and backward pass through a tiny graph, see why we run it in reverse, and connect it to the downhill step that actually does the learning.