It's not Android bashing - he's managed to make the older, slower iPhone hardware perform better than the current high-performance kings (the S3 & S4) through smart software optimisation.
There's no reason he couldn't do the same on Android and see similar gains. It would just be a lot of work..
He only implemented his optimized software on one platform and somehow starts to compare it with different software on another platform. How is that relevant for that other platform's performance?
So the summary is "JPEG encoder written in assembly with NEON instructions saves images faster than Apple's encoder."
That's a cool feat and is a little damning for Accelerate.framework, although the way techcrunch writes it I expected a new kind of fast cosine transform.
Don't forget that SnappyCam pumps both CPU cores when available.
The actual DCT algorithm created and used in the app is different to the typical AAN (Arai, Agui, Nakajima) DCT algorithm that's used in JPEG codecs, at least all the ones I've seen.
It's all about doing as little work as possible to achieve the end result. That's why there's so much asm implementation, with carefully chosen NEON instructions for each step.
Think of it as a cross-layer optimization between algorithm and implementation... done by hand. :-)
Really interested in the nuts and bolts - are you optimizing specifically for one quality setting (in which case I'm guessing you could probably do the quantization as part of the dct and throw away some calculations)?
I played with a realtime jpeg compression implementation back in college on transputers (yes I'm that old). Fun stuff, nice to see there are still places where going right down to the metal can make a real impact on a product...
While SnappyCam has been the most difficult, complex, piece of software I've written since I started coding in my early teens, it's also been one of the most satisfying technically.
I'd love to disclose the many, many optimizations baked in, but as this is a commercial app I must keep much of it as a trade secret.
I will say though that a lot of precomputation was involved, both for the encoder and decoder. Jumped at the chance to avoid computation, memory reads, etc., as much as possible. :-)
I find it amazing how you share your know-how so freely. This is the first app I ever saw that made me think of an iPhone as a potentially desirable thing... not enough to make me get one, but a big compliment to you. Never change (unless it's for the even more generous and clever of course :P)
Having shared a few beers with John Papandriopoulos (at an AusCTW workshop), I can vouch that he is capable of doing great things in signal processing. He's a smart guy [1].
G'day from across the ditch John and I'm glad to see things are going well! (from John D. in Sydney)
Turning another incarnation of CSIRO's wireless research into a product, and still dreaming of the Free Space Optics stuff. Happy to buy you a beer if you are passing though SYD! Keep well.
Will definitely look you up when I'm down next. Trying to drop by more consistently during the summer these days. I spent five weeks in Melbourne last Jan and loved every minute of warmth. :)