Lining Up The “El Capitan” Supercomputer In opposition to The AI Upstarts

The query is now not whether or not or not the “El Capitan” supercomputer that has been within the means of being put in at Lawrence Livermore Nationwide Laboratory for the previous week – with photographic proof to show it – would be the strongest system on the planet. The query is how lengthy it is going to maintain onto that title.

It could possibly be for fairly a very long time, because it seems. As a result of in terms of the large AI supercomputers that AI startups are funding, to make use of an previous adage that described IBM programs within the Nineteen Nineties: “You could find higher, however you may’t pay extra.”

It doesn’t seem like any of the key HPC facilities on the nationwide labs across the globe are going to area a persistent machine – which means not an ephemeral cloudy occasion that’s fired up for lengthy sufficient to run the Excessive Efficiency Linpack take a look at in double precision floating level that’s used to gauge the relative efficiency of machines and rank them on the Top500 checklist – that may beat El Capitan, which relying on our temper and our math we predict might weigh in at round 2.3 exaflops peak FP64 efficiency, about 37 p.c extra FP64 oomph than the 1.68 exaflops “Frontier” supercomputer at Oak Ridge Nationwide Laboratory that has been probably the most highly effective machine on the Top500 checklist since June 2022.

Means again in 2018, after the CORAL-2 contracts had been awarded, we anticipated for Frontier to return in at 1.3 exaflops FP64 peak at $500 million utilizing customized AMD CPUs and GPUs and for El Capitan to return in at 1.3 exaflops peak for $500 million utilizing off-the-shelf AMD CPUs and GPUs. That was additionally when the revised “Aurora A21” machine was slated to return in at round 1 exaflops for an estimated $400 million. All three of those machines are being put in later than anybody hoped when the HPC labs began planning for exascale in earnest again in 2015. And within the case of Frontier and El Capitan, we predict AMD supplied way more bang for the buck and outbid IBM and Nvidia for the contracts, which might have naturally gone to them given they’d constructed the prior technology “Summit” and “Sierra” programs at Oak Ridge and Lawrence Livermore. However that’s simply conjecture, after all.

Right here’s the purpose of 2023 and past: Don’t depend the hyperscalers and cloud builders and their AI startup prospects out of the hunt. They’re constructing very large machines, and maybe ones that, just like the one Nvidia and CoreWeave are engaged on for Inflection AI and those that Microsoft Azure is constructing for OpenAI, will surpass these huge HPC machines in terms of decrease precision AI coaching work.

Let’s do some math to match as we exhibit the El Capitan child photos that Lawrence Livermore has shared.

For our comparisons, let’s begin with that as-yet-unnamed system being constructed for Inflection AI, which we talked about final week when the photographs of the El Capitan machine surfaced.

That Inflection AI machine appears to be like like it’s utilizing 22,000 Nvidia H100 SXM5 GPU accelerators, and primarily based on what little we find out about H100 and InfiniBand Quantum 2 networking pricing, it will checklist for someplace round $1.35 billion if the nodes are configured one thing like a DGX H100 node with 2 TB of reminiscence, 3.45 TB of flash, and eight 400 Gb/sec ConnectX-7 community interfaces and an appropriate three-tier InfiniBand change cloth. That system could be rated at 748 petaflops of peak FP64 efficiency, which is attention-grabbing for the HPC crowd, and could be ranked second on the present Top500 checklist, behind Frontier at 1.68 exaflops FP64 peak and forward of the “Fugaku” system at RIKEN Lab in Japan, which has 537.2 petaflops FP64 peak.

Low cost this Inflection AI machine how you’ll, however we don’t assume Nvidia or AMD are in any temper to provide deep reductions on GPU compute engines when demand is way exceeding provide. And neither are their server OEM and ODM companions. And so, these machines are very dear certainly in comparison with the exascale HPC programs in the USA, and they’re much much less succesful, too.

In case you take a look at the FP16 half precision efficiency of the Inflection AI machine, it is available in at 21.8 exaflops, which appears like so much and which is a lot sufficient to drive some very massive LLMs and DLRMs – that’s massive language fashions and deep studying advice fashions.

Nobody is aware of what the FP16 matrix math efficiency of the “Antares” AMD Intuition MI300A CPU-GPU hybrid that powers El Capitan shall be, however we took a stab at guessing it again in June when a number of tidbits extra of knowledge had been revealed about this compute engine. We predict that Lawrence Livermore not solely is getting two CPU tiles on a bundle (changing two GPU tiles) and 6 GPU tiles, however it’s also getting an overclocked compute engine that may ship extra efficiency than an eight tile, GPU-only MI300 compute engine. (And if Lawrence Livermore didn’t get one thing like this, it ought to have.) If we’re proper, then with out sparsity math help turned on (which Inflection AI didn’t use when it talked concerning the efficiency of the machine it’s constructing with CoreWeave and Nvidia), then every MI300A is estimated to ship 784 teraflops with a 2.32 GHz clock frequency (in comparison with what we count on to be round a 1.7 GHz clock frequency for the common MI300 half).

We’re hopeful that Hewlett Packard Enterprise can get eight MI300As per sled within the El Capitan system, and if that occurs, the compute a part of El Capitan ought to weigh in at round 2,931 nodes, 46 cupboards, and eight rows. We will see.

What we wished to clarify is that if our guesses on the MI300A are right – we all know how large that if is, folks – then El Capitan ought to have round 23,500 MI300 GPUs and – watch for it – it ought to have round 18.4 exaflops of FP16 matrix math peak efficiency. That’s 80 p.c of the AI coaching oomph of the AI system being constructed with all that enterprise capital cash by Inflection AI, and for lots much less cash and with much more FP64 ommph.

Now, let’s take a stab at what the rumored 25,000 GPU cluster that Microsoft is constructing for OpenAI to coach GPT-5. Traditionally, as Nidhi Chappell, basic supervisor of Azure HPC and AI at Microsoft, defined to us again in March, Azure makes use of PCI-Specific variations of Nvidia accelerators to construct its HPC and AI clusters, and it makes use of InfiniBand networking to hyperlink them collectively. We assume this rumored cluster makes use of Nvidia H100 PCI-Specific playing cards, and at $20,000 a pop, that’s $500 million proper there. With a pair of Intel “Sapphire Rapids” Xeon SP host processors, 2 TB of most important reminiscence, and an affordable quantity of native storage, add one other $150,000 per node and that works out to a different $469 million for 3,125 nodes to deal with these 25,000 GPUs. InfiniBand networking would add, if Nvidia’s 20 p.c rule is a gauge, one other $242 million. That’s $1.21 billion. Low cost the server nodes when you really feel prefer it, however that’s $387,455 per node and it ain’t gonna budge that a lot. Not with a lot demand for AI programs.

As we are saying in New York Metropolis: Foegittaboutit .

In case you do the mathematics on this Microsoft/OpenAI cluster, it weighs in at 19.2 exaflops FP16 matrix math peak with sparsity off. The PCI-Specific variations of the H100 have fewer streaming multiprocessors – 114 versus 132 on the SXM5 model they usually clock slower, too. That’s about 11.4 p.c cheaper for 11.9 p.c much less efficiency.

These costs are loopy in comparison with what the US nationwide labs are getting – or no less than. Have been capable of get over time. The explanation why the HPC facilities of the world chase novel architectures is that they’ll pitch themselves as analysis and growth for a product that may finally commercialized. However the hyperscalers and cloud builders can do that identical math they usually may construct their very own compute engines, as Amazon Net Providers, Google, Baidu, and Fb are all doing to various levels. Even with a 50 p.c low cost, these Inflection AI and OpenAI machines are nonetheless much more costly per unit of compute than what the US nationwide labs are paying.

El Capitan will take up the identical footprint that the retired “ASCI Purple” and “Sequoia” supercomputers from days passed by, constructed by IBM for Lawrence Livermore, used successively – about 6,800 sq. toes. El Capitan is predicted to want someplace between 30 megawatts and 35 megawatts of energy and cooling at peak, and can run side-by-side with the subsequent exascale-class machine that Lawrence Livermore expects to put in round 2029, and so the datacenter energy and cooling capability on the lab has been doubled to accommodate these two machines operating concurrently.

By comparability, that ASCI Purple machine constructed by IBM and put in in 2005 at Lawrence Livermore was rated at 100 teraflops peak efficiency at FP64 precision and burned about 5 megawatts; it price an estimated $128 million. El Capitan might have 23,000X extra efficiency at someplace between 6X and 7X the facility draw and at 3.9X the fee. That might not be nearly as good because the exponential progress that supercomputing facilities had anticipated for a lot of a long time, however it’s nonetheless a outstanding feat and attests to the good thing about Moore’s Legislation and a complete lot of packaging, networking, and energy and cooling cleverness.

We will’t wait to see the actual numbers for El Capitan and Aurora A21 at Argonne Nationwide Laboratory. And if, as we suspect, Intel wrote off $300 million of the $500 million contract with Argonne, then there’s not going to be a less expensive AI and HPC on the planet. Sure, Argonne paid in time and can pay in electrical energy to make use of this machine, however as we identified two weeks in the past when the Aurora machine was being totally put in, what issues now could be getting the machine constructed and doing precise HPC and AI.

Signal as much as our Publication That includes highlights, evaluation, and tales from the week straight from us to your inbox with nothing in between.

Subscribe now

Check Also

🔮 Superduperconductivity; Stack underflow; AI benchmarks; Gravity holes ++ #433

Hello, I’m Azeem Azhar. As a worldwide professional on exponential applied sciences, I counsel governments, …