- This event has passed.
Portland Linux/Unix Group General Monthly Meeting: Inside AI Supercomputers, with Jesse Lopez
Summary:
Inside AI Supercomputers: From GPUs to Multi-DC Clusters
Large language models and other frontier AI systems are trained on clusters with thousands to over a hundred thousand GPUs. But what does that infrastructure actually look like? This talk walks through the anatomy of an AI supercomputer from the ground up: individual GPUs, multi-GPU nodes, racks, and full clusters. We’ll cover the three pillars of compute, storage, and networking, then look at how training and inference workloads place very different demands on hardware. Finally, we’ll explore how Linux runs the show at every layer from the OS on each node, to InfiniBand fabric
management, to job scheduling with Slurm, Kubernetes, and Ray.
No AI/HPC background required – just curiosity about what it takes to build and run the machines behind the models.
Bio:
Jesse Lopez is an AI/ML and Technical Program Manager in the Azure HPC/AI organization where he helps deploy large-scale AI infrastructure and works with customers to put it to use. A former scientist, he has a background in high-performance computing, AI/ML, and has been a Linux user since the nineteen hundreds.
Tags: plug, linux, HPC
Imported from: http://calagator.org/events/1250482513
