In terms of absolute compute performance per chip, perf per watt, and perf per d...

WithinReason · on Sept 5, 2022

A 2x efficiency improvement from a GPU to a specialised ASIC not particularly impressive. How much would you gain by removing the graphics related stuff from a GPU (texture pipelines, vertex processing, etc.?). In addition, they lose existing programming models like OpenCL and the compiler progress that happened in the last 10 years and have to roll their own. The amount of SW work needed to support this must be a lot to get the same ease of use out of it as GPUs. Maybe they made it more CPU like to make it easier to program?

mrb · on Sept 5, 2022

FP16 AI workloads are very important to AMD and Nvidia (AMD MI250X and Nvidia A100/H100 were really designed for this), and yet Tesla leapfrogged them with a more than 2x reduction in die area, and more features (eg. out of order exec). This is what's impressive. AMD, Nvidia, and even Intel, should have been leading this, but they weren't. Seems to be a classic innovator's dilemma.

WithinReason · on Sept 5, 2022

Let's compare with a competitor you never heard of:

    Nvidia RTX 3090: 285 Tensor-TOPs @ 350 W [1]

    Specialised accelerator: 100 TOPS at less than four watts [2]

That's a 30x improvement over a GPU in power consumption. Sure, it's inference only, but training and inference are not that different.

[1]: https://wccftech.com/nvidia-geforce-rtx-3090-geforce-rtx-308...

[2]: https://www.imaginationtech.com/product/img-4nx-mc8/

mrb · on Sept 5, 2022

TOPS isn't indicative of actual performance https://semiengineering.com/lies-damn-lies-and-tops-watt/ Also, this IMG chip is vaporware and they can quote an arbitrarily high TOPS/W by trading voltage & frequency with die area, therefore increasing cost beyond reasonable. I'm not surprised IMG shares no other metric.