Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The DCT is really neat, but the actual compression magic comes from a combination of side effects that occur after you apply it:

1. The DCT (II) packs lower frequency coefficients into the top-left corner of the block.

2. Quantization helps to zero out many higher frequency coefficients (toward bottom-right corner). This is where your information loss occurs.

3. Clever zig-zag scanning of the quantized coefficients means that you wind up with long runs of zeroes.

4. Zig-zag scanned blocks are RLE coded. This is the first form of actual compression.

5. RLE coded blocks are sent through huffman or arithmetic coding. This is the final form of actual compression (for intra-frame-only/JPEG considerations). Additional compression occurs in MPEG, et. al. with interframe techniques.



The "actual compression magic" has been used before DCT in other codecs, but applied directly to pixels gave lousy results.

You can also look at 90's software video codecs developed when DCT was still too expensive for video. They had all kinds of approaches to quantization and entropy coding, and they all were a pixelated mess.

DCT is the key ingredient that enabled compression of photographic content.


What's so special about DCT for image compression?

The main idea of lossy image compression is throwing away file detail, which means converting to frequency domain and throwing away high frequency coefficients. Conceptually FFT would work fine for this, so use of DCT instead seems more like an optimization rather than a key component.


The DCT, just like the DFT, is a discrete version of the (analog) Fourier transform. (FFT is just a clever, fast implementation of the DFT, just like when people say “DCT” they usually mean a similarly fast implementation of the DCT. Call it FCT if you wish.) Where they differ is the assumed boundary conditions; DFT works like Fourier-transforming a signal that is repeated and then loops forever, while DCT is like Fourier-transforming a signal that is _reflected_ and then loops forever. (There is also a related transform called DST, where it assumes the signal is _inverted_ and reflected. It's occasionally useful in solving certain kinds of differential equations; only rarely so in compression.)

This matches much better with what happens when you cut out small pieces of a signal, so it gives less noise and thus better energy isolation. Say your little pixel block (let's make it 8x1 for simplicity) is a simple gradient, so it goes 1, 2, 3, 4, 5, 6, 7, 8. What do you think has the least amount of high-frequency content you'd need to deal with; 123456788765432112345678… or 123456781234567812345678…?

(There's also a DCT version that goes more like 1234567876543212345678 etc., but that's a different story)


A DST is optimal when the variance of the signal you are compressing grows linearly (as opposed to the DCT, which is optimal for uniform variance). That happens, for example, when you have a prediction on one side (e.g., intra prediction in image compression) and you are coding the prediction residual. The farther you get from the source of your prediction, the less accurate it is likely to be. Block-based motion compensation prediction residuals in video coding are also not uniform: the prediction error is higher at block edges than the block center, so a DST sometimes works better (e.g., when using a transform size smaller than the motion compensation block size).

So still useful for compression, just in more specialized circumstances.


Yes, I think “only rarely useful” covers it :-) I know there are some codecs that support it, e.g. VP9?


Probably every video coding standard published after c. 2012 has some version of it in its toolbox (including VP9).


Thanks!


In practice there is a difference because FFT would have more edge artifacts at block boundaries - lowering visual quality - and DCT has better energy compaction into lower frequencies meaning longer runs of zeros of higher frequency coefficients after quantization so better compression. Another plus is the DCT only needs real numbers.


There's at least one specific optimization in H.264 (aka "MPEG-4 Part 10") to smooth out these artifacts at block boundaries, called "deblocking".

https://en.wikipedia.org/wiki/Deblocking_filter


And then there's the current maintainer of libjpeg, who switched the down- and up-scaling algorithms for chroma subsampling to use DCT scaling because that's mathematically more beautiful, which does introduce some additional artifacts at the block boundaries of each upsampled chroma block.


DCT is now replaced by Hadamard Transform which can be implemented by additions/subtractions and don't have the drift problem of DCT. HT was considered before DCT, but during that time DCT was picked because of better perceptual quality. Later during H.264 standardization, HT replaced DCT and is now used in all video codecs instead of DCT.


Thanks for the info, looking into Hadamard Matrices recently for wireless ECC and the fact it's being used in compression algorithm was oblivious to me.

What is interesting is that the techniques that are being used in compression, communication and signal processing in general involved orthogonality and Nasir's master and PhD research thesis were in the area of Orthogonal Transform for Digital Signal Processing.

Hadamard Transform can provide orthogonality but unlike DCT and DFT/FFT that are limited to real and complex respectively, Hadamard Transform is very versatile and can be used in real, complex, quaternion and also octonion numbering schemes that probably the latter are more suited for higher dimensions data and signal processing.

Hadamard orthogonal codes has also been used as ECC in reliable space communication in both the Mariner and Voyager missions, for examples [1].

[1] On some applications of Hadamard matrices [PDF]:

https://documents.uow.edu.au/~jennie/WEB/WEB05-10/2005_12.pd...


Interestingly enough, JPEG XR used a form of the Hadamard Transformation, but JPEG XL (which is newer) uses DCT and Haar transforms.

[edit]

Combined with the information from sibling comments, it seems that the Hadamard transform was something used in standards developed in the '00s but not since.


WHT is essentially a cheap implementation of multi-dimensional DCT, so it approximates but doesn't actually replace DCT in all scenarios. It seems that DCT is a better fit for photographic contents than WHT but was more expensive until FP multiplication became much cheaper so WHT was briefly considered as an alternative.


probably because multiplication got really fast


JPEG XL supports lossless JPEG transcode, so it has to offer a superset of the features in JPEG.


Yes, but variable-blocksize DCT is definitely used in the lossy modes (not sure about lossless).


The squeeze transform used for extra progressive decoding is effectively a modified Haar transform.


Yeah, JPEG uses the DCT, so JPEG XL needs it too to be a superset.


Yes, but if the DCT were purely vestigial, then:

1. It wouldn't have support for DCTs not in JPEG.

2. It wouldn't use the DCT in its lossy-compression of photographic content if another transform was considered significantly better.

Perhaps one could argue that they didn't want to add extra transforms, but they do use a modified Haar transform for e.g. synthetic content and alpha channels.


Nope.

X265/HEVC https://en.m.wikipedia.org/wiki/High_Efficiency_Video_Coding

Also not true for X266/VVC.


AV1 also uses DCT and DST, but not Hadamard.


Technically, AV1 also uses Hadamard in the lossless modes.


Thanks for the correction, I didn't know that.


correct, it is integer DCT. Lot of techniques adopted from the integer transform of H.264. That's what I meant, not the floating point DCT proposed in 70s.


The big change is basically that we now typically specify exactly which integer approximation to the (real-valued, “ideal”) DCT to use; this means the decoder and encoder is much likely to fall out of sync. As a bonus, this means we can use a slightly worse but much faster approximation without catastrophes happening, and possibly also make it exactly invertible.

https://fgiesen.wordpress.com/2013/11/04/bink-2-2-integer-dc... has a ton of technical information if you want to dive into it.


That is true for advanced techniques, but for simple compression you can simply throw away high frequency coefficients. The simplicity of dct makes it so impressive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: