In terms of absolute compute performance per chip, perf per watt, and perf per die area, it looks like Dojo matches or surpasses the best GPUs of today: «Tesla claims a die with 354 Dojo cores can hit 362 BF16 TFLOPS at 2 GHz»
For comparison, the fastest single-chip GPU today is the AMD MI250X which has 220 "cores" (compute units) totaling 383 BF16 TFLOPS at 1.7 GHz, and that's a monster 560 watt chip.
The Dojo chip is likely under this 560 was TDP, so more efficient. And Tesla provides roughly the same compute performance, but the chip is arrayed in 61% more cores, meaning it is far more suitable to handle branchy code. Also Tesla claims the die measures only 645 mm², compared to 1540 mm² for AMD. So the wafer fabrication cost is roughly half(!) of AMD's!
If Tesla has truly managed to build that, I'm impressed.
Edit: I missed that Tesla claims "less than 600 watt" per chip. So we know its comparable or less than AMD (560 watt).
Edit 2: 25 dies are packed on a single system-on-wafer. That's 15 kW on a disc of 30 cm (12 in) of diameter. Sheesh! That must require an ungodly liquid cooling system!
Edit 3: there is more info, including rendering of the host interface card at:
A 2x efficiency improvement from a GPU to a specialised ASIC not particularly impressive. How much would you gain by removing the graphics related stuff from a GPU (texture pipelines, vertex processing, etc.?). In addition, they lose existing programming models like OpenCL and the compiler progress that happened in the last 10 years and have to roll their own. The amount of SW work needed to support this must be a lot to get the same ease of use out of it as GPUs. Maybe they made it more CPU like to make it easier to program?
FP16 AI workloads are very important to AMD and Nvidia (AMD MI250X and Nvidia A100/H100 were really designed for this), and yet Tesla leapfrogged them with a more than 2x reduction in die area, and more features (eg. out of order exec). This is what's impressive. AMD, Nvidia, and even Intel, should have been leading this, but they weren't. Seems to be a classic innovator's dilemma.
TOPS isn't indicative of actual performance https://semiengineering.com/lies-damn-lies-and-tops-watt/ Also, this IMG chip is vaporware and they can quote an arbitrarily high TOPS/W by trading voltage & frequency with die area, therefore increasing cost beyond reasonable. I'm not surprised IMG shares no other metric.
For comparison, the fastest single-chip GPU today is the AMD MI250X which has 220 "cores" (compute units) totaling 383 BF16 TFLOPS at 1.7 GHz, and that's a monster 560 watt chip.
The Dojo chip is likely under this 560 was TDP, so more efficient. And Tesla provides roughly the same compute performance, but the chip is arrayed in 61% more cores, meaning it is far more suitable to handle branchy code. Also Tesla claims the die measures only 645 mm², compared to 1540 mm² for AMD. So the wafer fabrication cost is roughly half(!) of AMD's!
If Tesla has truly managed to build that, I'm impressed.
Edit: I missed that Tesla claims "less than 600 watt" per chip. So we know its comparable or less than AMD (560 watt).
Edit 2: 25 dies are packed on a single system-on-wafer. That's 15 kW on a disc of 30 cm (12 in) of diameter. Sheesh! That must require an ungodly liquid cooling system!
Edit 3: there is more info, including rendering of the host interface card at:
https://www.servethehome.com/tesla-dojo-custom-ai-supercompu... and
https://www.servethehome.com/tesla-dojo-ai-system-microarchi...
Edit 4: found a pic of the liquid cooling system - as expected ;) https://media.datacenterdynamics.com/media/images/training_t... source: https://www.datacenterdynamics.com/en/news/tesla-details-doj... And they say the first tile was "tested last week" as of August 20th... This confirms my suspicion that the system is barely (?) functional. Also see "Venkataramanan appeared to even surprised Andrej Karpathy, Tesla’s head of AI, on stage by revealing for the first time that Dojo training tile ran one of his neural networks" from https://electrek.co/2021/08/20/tesla-dojo-supercomputer-worl...