|
top500.org unofficial then official
Top Tier (Largest Single-Site or Dedicated Clusters)
xAI Colossus (Memphis, TN) — ~500k–555k+ NVIDIA
GPUs (H100/H200/GB200 mix), scaling toward 1M. ~300 MW to 2 GW potential.
Unique AI uses: Rapid training of successive
Grok models (frontier LLMs with real-time knowledge, reasoning, and
multimodal capabilities). Emphasizes speed of deployment and massive scale
for AGI pursuit.
markets.financialcontent.com
Google Columbus Cluster (New Albany, OH) —
Hundreds of thousands of TPUs (multiple generations), >500 MW AI portion
(part of >1 GW total).
Unique AI uses: Training and inference for
Gemini models; multi-data-center distributed training; powers Google Search,
YouTube recommendations, and cloud AI services.
terakraft.no
Google Omaha Cluster (NE) — Similar scale to
Columbus, hundreds of thousands TPUs, >500 MW AI.
Unique AI uses: Large-scale TPU-based training;
supports global AI workloads with fiber-linked distributed architecture.
terakraft.no
Meta Columbus Site (OH) — ~100k–1M+ GPUs (mix,
including high-density), >500 MW, uses “tents” for rapid deployment.
Unique AI uses: Training Llama models; powers
recommendation systems, content moderation, and Meta’s social/AR/VR AI
features. Focus on open-source releases.
terakraft.no
Amazon Project Rainier (New Carlisle, IN) —
~500k Trainium2 chips, ~420 MW (scaling higher).
Unique AI uses: Training/inference for
Anthropic’s Claude models (primary partner); cost-efficient custom silicon
for hyperscale workloads.
terakraft.no
terakraft.no
xAI Colossus 2 / expansions (Memphis) — >110k
GB200s (part of overall Colossus growth).
Unique AI uses: Same as main
Colossus—accelerated Grok iterations with emphasis on raw scale and quick
build times.
terakraft.no
Strong Contenders (Large Dedicated or
Campus-Scale)
Microsoft Azure Fairwater Campus (Mount
Pleasant, WI) — >150k GB200s, >350 MW (scaling big).
Unique AI uses: Training OpenAI models (GPT
series); Azure AI services, enterprise copilots, and multimodal research.
terakraft.no
Microsoft Azure Atlanta Site — Similar to
Fairwater (>150k GB200s, >350 MW).
Unique AI uses: Supports OpenAI partnership and
broad Azure AI cloud workloads.
terakraft.no
Amazon Mississippi AI Data Center (Canton) —
Hundreds of thousands Trainium2, >300 MW (to 1 GW+).
Unique AI uses: Custom silicon training for AWS
customers and internal models; energy sector and enterprise AI.
terakraft.no
OpenAI/Microsoft Stargate (Abilene, TX / other
sites) — ~100k+ Blackwell, rapidly expanding (part of multi-GW plans).
Unique AI uses: Next-gen GPT/ frontier model
training; closed-loop liquid cooling for high-density AI.
terakraft.no
Oracle OCI Supercluster — ~65k H200s (and
growing).
Unique AI uses: Cloud AI services; supports
enterprise and research workloads, including partnerships.
visualcapitalist.com
Meta other large clusters (e.g., various US
sites) — Part of ~1M GPU total deployment.
Unique AI uses: Llama ecosystem, advertising AI,
and metaverse/embodied AI.
bisresearch.com
Microsoft total Azure clusters (distributed,
hundreds of thousands GPUs).
Unique AI uses: Broad enterprise AI, OpenAI
integration, and inference-heavy workloads.
bisresearch.com
Google total TPU fleets (distributed campuses).
Unique AI uses: Efficient inference + training;
powers Gemini, Search, and scientific AI.
etcjournal.com
Amazon total Trainium/Inferentia (multi-site).
Unique AI uses: Cost-optimized training for
partners like Anthropic; cloud AI offerings.
bisresearch.com
Tesla Cortex / Dojo (various sites) — ~50k+ GPUs
+ custom Dojo chips.
Unique AI uses: Full self-driving (FSD)
training, robotics, and video understanding for autonomous vehicles.
visualcapitalist.com
CoreWeave clusters — ~42k H200s (and larger).
Unique AI uses: Cloud GPU provider for AI
startups and researchers; flexible rental for training.
visualcapitalist.com
Lambda Labs — ~32k H100/H200.
Unique AI uses: On-demand AI training for
developers and smaller labs.
visualcapitalist.com
Anthropic on AWS (Project Rainier + others) —
Significant Trainium + GPU access (multi-hundred MW commitments).
Unique AI uses: Claude model family—focus on
safety, constitutional AI, and enterprise reliability.
terakraft.no
Key Trends
NVIDIA dominance in GPU clusters (Colossus,
Microsoft, Meta) vs. custom silicon (Google TPUs, Amazon Trainium) for
efficiency/cost.
terakraft.no
Many are shifting toward inference and agentic
AI alongside training.
Power is the new bottleneck (hundreds of MW to
GW-scale), driving innovations in cooling, energy sourcing, and rapid
deployment (e.g., tents, retrofitted factories).
Supercomputers in quantum computing
Smaller official
supercompute
Fugaku (RIKEN, Japan)
Architecture: Fujitsu A64FX Arm-based processors
(no GPUs in main ranking).
Performance: ~442 Petaflops.
Unique AI uses: Traditional HPC strengths in
disaster prevention, drug discovery, and materials; supports Arm-based AI
workflows and large-scale simulations.
top500.org
Alps (Swiss National Supercomputing Centre,
Switzerland)
Unique AI uses: Scientific AI, climate modeling,
and research in physics/chemistry with Grace Hopper's CPU+GPU efficiency for
mixed workloads.
top500.org
LUMI (EuroHPC/CSC, Finland)
Architecture: HPE Cray EX with AMD Instinct
MI250X.
Performance: ~380 Petaflops.
Unique AI uses: Broad European research — AI for
materials, life sciences, and climate. Part of EuroHPC's push for accessible
large-scale AI.
top500.org
Leonardo (EuroHPC/CINECA, Italy)
Architecture: BullSequana with NVIDIA A100 GPUs.
Performance: ~241 Petaflops.
Unique AI uses: Industrial and scientific AI,
simulations, and data-intensive workloads across European academia and
industry.
top500.org
Key Trends for AI
US dominance in raw power (top 3), focused on
national labs for science + security.
top500.org
NVIDIA-heavy systems (e.g., JUPITER Booster,
Alps, Eagle) often shine in practical AI training/inference due to CUDA
ecosystem and lower-precision performance.
Many of these support hybrid AI+HPC workflows:
using AI to accelerate simulations, surrogate models, or generative design.
This aligns well with Nvidia booth discussions —
systems like JUPITER Booster and Alps showcase Grace Hopper/Blackwell-era
platforms in real-world exascale AI. At the expo, you could ask how their
platforms (DGX, Jetson, etc.) power or scale similar workloads. Let me know
if you want details on any specific system!
|
Platforms
DGX Platform (their flagship AI supercomputer
line):
(Blackwell-based) and newer desktop options.
DGX Spark: A compact personal AI supercomputer
(Grace Blackwell Superchip) models up to ~200B parameters
DGX Station and full DGX SuperPOD for
Jetson + Isaac Platform (edge/robotics/physical AI):
Compact, power-efficient modules (e.g., Jetson
Thor with Blackwell) for deploying AI on robots, drones, autonomous machines,
and industrial edge. Nvidia's big push into "physical AI" — robots
that perceive, reason, and act in the real world.
nvidianews.nvidia.com
Omniverse:
The platform for building 3D worlds, digital
twins, and collaborative simulation using OpenUSD and RTX tech. Critical for
developing and testing physical AI (robots, factories, autonomous systems)
before real-world deployment. "operating system for the
metaverse/industrial digital twins."
nvidia.com
Blackwell Architecture
Next up: Vera Rubin platform (announced
recently)—extreme codesign across multiple new chips (Rubin GPU, Vera CPU,
new networking like NVLink 6, BlueField-4 DPU, etc.) for even better
inference economics and massive-scale AI factories.
investor.nvidia.com CUDA: The foundational parallel computing platform and
ecosystem. huge library of optimized tools, TensorRT, NeMo, NIM microservices
for easy deployment, etc.).
Full-stack networking & infrastructure:
NVLink (high-speed GPU interconnect), BlueField DPUs (smart NICs for data
center offload), MGX modular architecture. This lets them build efficient
"AI factories."
Software layer: NeMo (for LLMs), Triton, Run:ai,
and agentic AI tools.
Autonplatforms
– drive &======
Compueters continued
OFFICIAL========== are exascale systems (over 1 exaflop/s)
and heavily used for AI workloads alongside traditional HPC simulations.
Top 10 Supercomputers (November 2025)
El Capitan (Lawrence Livermore National Lab,
USA)
Architecture: HPE Cray EX255a with AMD EPYC 4th
Gen + Instinct MI300A accelerators.
Performance: ~1.809 Exaflops (Rmax).
Unique AI uses: Nuclear stockpile stewardship,
advanced materials science, and large-scale AI for national security
applications. Strong in mixed-precision AI and scientific
Frontier (Oak Ridge National Lab, USA) :
Pioneering AI-driven science, including climate modeling, drug discovery, and
fusion energy research. It excels at coupling traditional simulations with AI
surrogates for faster insights.
Aurora (Argonne National Lab, USA)
Unique AI uses: Leads in many AI-specific
benchmarks (e.g., HPL-MxP). Used for AI-accelerated discovery in battery
materials, drug design, protein folding, cosmology, and fusion. Strong
emphasis on integrating AI with simulation and data analysis.
JUPITER Booster (Jülich Supercomputing Centre,
Germany – EuroHPC)
Architecture: BullSequana XH3000 with NVIDIA
GH200 Grace Hopper Superchips.
Performance: 1.000 Exaflops (first European
exascale system).
Focused on training large language/multimodal
models for European languages, climate science, digital twins (e.g., human
organs), quantum computing validation, and industrial AI. Highly
energy-efficient and renewable-powered.
fz-juelich.de
Eagle (Microsoft Azure, USA)
Architecture: NDv5 with NVIDIA H100 GPUs.
Performance: ~561 Petaflops.
Unique AI uses: Cloud-based AI model training
and commercial/research workloads. Supports large-scale generative AI and
hyperscale AI infrastructure.
HPC6 (Eni S.p.A., Italy)
Architecture: HPE Cray EX with AMD Instinct
MI250X.
Performance: ~478 Petaflops.
Unique AI uses: Energy sector applications —
seismic imaging, reservoir simulation, and AI for oil/gas exploration and
optimization
.
Compare AI uses of top systems
|
No comments:
Post a Comment