So the summary is "JPEG encoder written in assembly with NEON instructions saves...

jpap · on Aug 1, 2013

Don't forget that SnappyCam pumps both CPU cores when available.

The actual DCT algorithm created and used in the app is different to the typical AAN (Arai, Agui, Nakajima) DCT algorithm that's used in JPEG codecs, at least all the ones I've seen.

It's all about doing as little work as possible to achieve the end result. That's why there's so much asm implementation, with carefully chosen NEON instructions for each step.

Think of it as a cross-layer optimization between algorithm and implementation... done by hand. :-)

midnightclubbed · on Aug 1, 2013

Really interested in the nuts and bolts - are you optimizing specifically for one quality setting (in which case I'm guessing you could probably do the quantization as part of the dct and throw away some calculations)? I played with a realtime jpeg compression implementation back in college on transputers (yes I'm that old). Fun stuff, nice to see there are still places where going right down to the metal can make a real impact on a product...

jpap · on Aug 1, 2013

Oh that's awesome and a lot of fun!

While SnappyCam has been the most difficult, complex, piece of software I've written since I started coding in my early teens, it's also been one of the most satisfying technically.

I'd love to disclose the many, many optimizations baked in, but as this is a commercial app I must keep much of it as a trade secret.

I will say though that a lot of precomputation was involved, both for the encoder and decoder. Jumped at the chance to avoid computation, memory reads, etc., as much as possible. :-)

binarycrusader · on Aug 1, 2013

One of my colleagues at work (Bart Smaalders) is known for the saying (paraphrasing?): "The easiest way to go faster is to do less work."

Well done on realising something that seems obvious in retrospect, but most people still miss.

jpap · on Aug 1, 2013

haha, very cool. Smart man! :D