Why are they comparing it against e.g. Skylake, which is 6 years old?

rgbrenner · on Feb 7, 2024

It's worse than that. There's literally nothing similar about these systems. One is a system designed for the supercomputer MareNostrum 4 and the other for MareNostrom 5 (a completely different system).... So old CPU, but also different network cards, topology, memory (capacity and speed), storage system, operating system (SuSE from 2016 vs Ubuntu 22)... and so on. For example, they went from 10Gb ethernet to 200Gb infiniband.

And then they took all of the performance improvements that each of these contribute... and attributed them to the Nvidia CPU.

stonogo · on Feb 8, 2024

This is a misrepresentation. Included in the analyses are single-node runs, which don't care about network cards etc. This is a platform comparison, not a CPU showdown; among the questions here is whether Grace-based nodes are feasible at all for production HPC. The answer is a tentative yes, although I still have concerns about cooling at this density in a general-use (i.e. highly fluctuating) workload.

But mostly, these numbers are for their users, who are aware the system contract has been awarded but want to know what to expect when their workloads hit the new system.

Incidentally, MareNostrum 4 has a 100gbit Omnipath fabric. I'm sure they'd love to test against latest Omnipath, but Intel dumped the tech, so our choices these days are 200/400 gbit Ethernet or similar-throughput Infiniband.

snakeyjake · on Feb 7, 2024

The idealist in me says "that's what they have, and these organizations don't have the cash to buy new stuff for benchmarking".

The cynic in me says "that's the only way for ARM to even appear competitive in HPC".

Note: real HPC not "serve-ads-or-train-ad-serving-ai-models-with-the-highest-flop-per-watt" HPC.

rbanffy · on Feb 7, 2024

Bear in mind at least one of those notes the code wasn’t optimised for ARM while all the meaningful HPC code in existence has been painstakingly optimised for Intel for decades.

StillBored · on Feb 8, 2024

Right the arm stuff is probably in the "it runs" camp. Largely because its SVE, which is barely available, and the code written to utilize it has largely probably been tuned for the a64fx, or maybe the gravaton v1's.

Both of which have considerably different memory and vector size/issue characteristics. So three different SVE variations now, and the previous two show significant uplift when given custom tuning (ex: see gcc -mtune=neoverse-512tvb, vs the custom a64fx compiler benchmarks). Arm put a bunch of effort into creating an instruction set that is microarch agnostic, but then its not exactly worked the first couple tries. Maybe that will be fixed with V2 and all SVE cores going forward.

rbanffy · on Feb 9, 2024

Indeed. Right now there is about 0% HPC code tuned to Grace and Grace Hopper.

I'd love if Nvidia made reasonably priced Grace and Grace Hopper ATX boards (or a Nvidia Studio stylish desktop, priced like a Mac mini) developers could buy so that we can do our best to optimize code for Grace for free in our spare time.

Same goes for AMD and their MI300 family, in case AMD is listening. There is less to be gained, as the x86 side is pretty well cared for ATM, but, still, I'd love to see such a beast.

PedroBatista · on Feb 7, 2024

Because they are comparing it with the system they have and a new system they might eventually have.

It's their own analysis for their own benefit.

wmf · on Feb 7, 2024

They should compare what they have vs Grace, what they have vs Genoa, and what they have vs Emerald Rapids.

yjftsjthsd-h · on Feb 8, 2024

How does that help? Surely they'd still need to compare against buying a new x86 system

stonogo · on Feb 8, 2024

They already awarded the contract. The question users will be asking is "how much faster is the new computer compared to the one we've been using?" These are the answers.

drewg123 · on Feb 8, 2024

If you scroll down to the Stony Brook results, they compare it to more modern CPUs.

I've had access to one of these (interested in it for its massive amounts of IO bandwidth for the power budget), and its stunningly fast. And yes, it runs FreeBSD.

ksec · on Feb 10, 2024

>I've had access to one of these (interested in it for its massive amounts of IO bandwidth for the power budget), and its stunningly fast. And yes, it runs FreeBSD.

Are we going to get Serving Netflix Video Traffic at 1600Gb/s and Beyond anytime soon? :)