The DCT is really neat, but the actual compression magic comes from a combinatio...

pornel · on Aug 20, 2024

The "actual compression magic" has been used before DCT in other codecs, but applied directly to pixels gave lousy results.

You can also look at 90's software video codecs developed when DCT was still too expensive for video. They had all kinds of approaches to quantization and entropy coding, and they all were a pixelated mess.

DCT is the key ingredient that enabled compression of photographic content.

HarHarVeryFunny · on Aug 20, 2024

What's so special about DCT for image compression?

The main idea of lossy image compression is throwing away file detail, which means converting to frequency domain and throwing away high frequency coefficients. Conceptually FFT would work fine for this, so use of DCT instead seems more like an optimization rather than a key component.

Sesse__ · on Aug 21, 2024

The DCT, just like the DFT, is a discrete version of the (analog) Fourier transform. (FFT is just a clever, fast implementation of the DFT, just like when people say “DCT” they usually mean a similarly fast implementation of the DCT. Call it FCT if you wish.) Where they differ is the assumed boundary conditions; DFT works like Fourier-transforming a signal that is repeated and then loops forever, while DCT is like Fourier-transforming a signal that is _reflected_ and then loops forever. (There is also a related transform called DST, where it assumes the signal is _inverted_ and reflected. It's occasionally useful in solving certain kinds of differential equations; only rarely so in compression.)

This matches much better with what happens when you cut out small pieces of a signal, so it gives less noise and thus better energy isolation. Say your little pixel block (let's make it 8x1 for simplicity) is a simple gradient, so it goes 1, 2, 3, 4, 5, 6, 7, 8. What do you think has the least amount of high-frequency content you'd need to deal with; 123456788765432112345678… or 123456781234567812345678…?

(There's also a DCT version that goes more like 1234567876543212345678 etc., but that's a different story)

derf_ · on Aug 21, 2024

A DST is optimal when the variance of the signal you are compressing grows linearly (as opposed to the DCT, which is optimal for uniform variance). That happens, for example, when you have a prediction on one side (e.g., intra prediction in image compression) and you are coding the prediction residual. The farther you get from the source of your prediction, the less accurate it is likely to be. Block-based motion compensation prediction residuals in video coding are also not uniform: the prediction error is higher at block edges than the block center, so a DST sometimes works better (e.g., when using a transform size smaller than the motion compensation block size).

So still useful for compression, just in more specialized circumstances.

Sesse__ · on Aug 21, 2024

Yes, I think “only rarely useful” covers it :-) I know there are some codecs that support it, e.g. VP9?

derf_ · on Aug 22, 2024

Probably every video coding standard published after c. 2012 has some version of it in its toolbox (including VP9).

HarHarVeryFunny · on Aug 21, 2024

Thanks!

pizza · on Aug 21, 2024

In practice there is a difference because FFT would have more edge artifacts at block boundaries - lowering visual quality - and DCT has better energy compaction into lower frequencies meaning longer runs of zeros of higher frequency coefficients after quantization so better compression. Another plus is the DCT only needs real numbers.

disqard · on Aug 21, 2024

There's at least one specific optimization in H.264 (aka "MPEG-4 Part 10") to smooth out these artifacts at block boundaries, called "deblocking".

https://en.wikipedia.org/wiki/Deblocking_filter

iggldiggl · on Aug 23, 2024

And then there's the current maintainer of libjpeg, who switched the down- and up-scaling algorithms for chroma subsampling to use DCT scaling because that's mathematically more beautiful, which does introduce some additional artifacts at the block boundaries of each upsampled chroma block.

kappi · on Aug 20, 2024

DCT is now replaced by Hadamard Transform which can be implemented by additions/subtractions and don't have the drift problem of DCT. HT was considered before DCT, but during that time DCT was picked because of better perceptual quality. Later during H.264 standardization, HT replaced DCT and is now used in all video codecs instead of DCT.

teleforce · on Aug 21, 2024

Thanks for the info, looking into Hadamard Matrices recently for wireless ECC and the fact it's being used in compression algorithm was oblivious to me.

What is interesting is that the techniques that are being used in compression, communication and signal processing in general involved orthogonality and Nasir's master and PhD research thesis were in the area of Orthogonal Transform for Digital Signal Processing.

Hadamard Transform can provide orthogonality but unlike DCT and DFT/FFT that are limited to real and complex respectively, Hadamard Transform is very versatile and can be used in real, complex, quaternion and also octonion numbering schemes that probably the latter are more suited for higher dimensions data and signal processing.

Hadamard orthogonal codes has also been used as ECC in reliable space communication in both the Mariner and Voyager missions, for examples [1].

[1] On some applications of Hadamard matrices [PDF]:

https://documents.uow.edu.au/~jennie/WEB/WEB05-10/2005_12.pd...

aidenn0 · on Aug 20, 2024

Interestingly enough, JPEG XR used a form of the Hadamard Transformation, but JPEG XL (which is newer) uses DCT and Haar transforms.

[edit]

Combined with the information from sibling comments, it seems that the Hadamard transform was something used in standards developed in the '00s but not since.

lifthrasiir · on Aug 21, 2024

WHT is essentially a cheap implementation of multi-dimensional DCT, so it approximates but doesn't actually replace DCT in all scenarios. It seems that DCT is a better fit for photographic contents than WHT but was more expensive until FP multiplication became much cheaper so WHT was briefly considered as an alternative.

adgjlsfhk1 · on Aug 20, 2024

probably because multiplication got really fast

edflsafoiewq · on Aug 21, 2024

JPEG XL supports lossless JPEG transcode, so it has to offer a superset of the features in JPEG.

aidenn0 · on Aug 21, 2024

Yes, but variable-blocksize DCT is definitely used in the lossy modes (not sure about lossless).

lifthrasiir · on Aug 21, 2024

The squeeze transform used for extra progressive decoding is effectively a modified Haar transform.

edflsafoiewq · on Aug 21, 2024

Yeah, JPEG uses the DCT, so JPEG XL needs it too to be a superset.

aidenn0 · on Aug 21, 2024

Yes, but if the DCT were purely vestigial, then:

1. It wouldn't have support for DCTs not in JPEG.

2. It wouldn't use the DCT in its lossy-compression of photographic content if another transform was considered significantly better.

Perhaps one could argue that they didn't want to add extra transforms, but they do use a modified Haar transform for e.g. synthetic content and alpha channels.

mbtwl · on Aug 20, 2024

Nope.

X265/HEVC https://en.m.wikipedia.org/wiki/High_Efficiency_Video_Coding

Also not true for X266/VVC.

aidenn0 · on Aug 20, 2024

AV1 also uses DCT and DST, but not Hadamard.

derf_ · on Aug 21, 2024

Technically, AV1 also uses Hadamard in the lossless modes.

aidenn0 · on Aug 21, 2024

Thanks for the correction, I didn't know that.

kappi · on Aug 20, 2024

correct, it is integer DCT. Lot of techniques adopted from the integer transform of H.264. That's what I meant, not the floating point DCT proposed in 70s.

Sesse__ · on Aug 21, 2024

The big change is basically that we now typically specify exactly which integer approximation to the (real-valued, “ideal”) DCT to use; this means the decoder and encoder is much likely to fall out of sync. As a bonus, this means we can use a slightly worse but much faster approximation without catastrophes happening, and possibly also make it exactly invertible.

https://fgiesen.wordpress.com/2013/11/04/bink-2-2-integer-dc... has a ton of technical information if you want to dive into it.

bjourne · on Aug 21, 2024

That is true for advanced techniques, but for simple compression you can simply throw away high frequency coefficients. The simplicity of dct makes it so impressive.