Forget LinPack and friends. Jack Dongarra is going to need to switch to the new metric for supercomputers—-kilograms of H100 GPUs—- about 3,300 give or take a few grams for this system.
> For use by startup investments of Nat Friedman and Daniel Gross
> Reach out if you want access
I'm confused by the last two bullet points. Is this website only meant to be used by these "startup investments" or can anyone fill out the linked form?
Can the creators explain in more detail: how is this different from (for example) the OpenAI cluster that MSFT built in Azure? Is it hosted in an existing cloud provider, or in a data center? Which data center? Who admins the system, is there an SRE team in case it goes down during training? And can you attempt ot run the same benchmarks that Top500 uses to determine what your double precision flops are and give that number in addition to your "10 exaflops" (which I believe is single precision).
as an ex-supercomputer nerd (where the fastest system in teh world finally reached over 1 exaflops of double precision), it seems awfully weird to call FP8 "flops". There's nothing truly wrong with it (since "flops" is a fairly poorly defined term), but it makes it clear that ML supercomputers are very different beasts from classic supercomputers. And also makes me wonder if/when the classic folks will try to make more codes work correctly with smaller precision (for example, in molecular dynamics).
@nat would you be interested in presenting about this at The AI Conference in SF?
Your take on hardware-enabled investment would be interesting.
We also have folks like Hugging Face’s GPT Research Lead, Langchain and LlamaIndex founders, Cerebras’s CEO and many more speaking. It’s
A builder-heavy audience.
AI Grant, back in 2018, offered £2500 and got all sorts of skeptical folks doubting Nat's and Daniel's motives (they will steal IP! there's some gotcha here!): https://news.ycombinator.com/item?id=16760736
They offer 100x that at £250000 per team now, plus this humongous GPU cluster. Way to start small and work your way to this. Amazing execution.
April 2018 was a more nascient time for AI: BERT wasn't open sourced until November 2018 and GPT-2 wasn't open sourced until February 2019, both of which kicked off the AI boom.
Y'all could totally eat Meta's lunch and train an open LLM with all the innovations that have come since LLaMA's release. Other startups are trying, but they all seem bottlenecked by training time/resources.
This could be where the next Stable Diffusion 1.5 comes from.