Microsoft builds a supercomputer to run OpenAI’s AI research


agencies

Thursday, March 16, 2023 01:00 AM

Microsoft built a supercomputer for an artificial intelligence (AI) research startup OpenAI To train large groups of models, it integrates thousands of Nvidia A100 graphics chips to support ChatGPT and Bing AI chatbot.

The Windows maker invested $1 billion in OpenAI in 2019 and agreed to build a “large, cutting-edge supercomputer”.

Why did Microsoft build a supercomputer?

The goal of building this supercomputer is to provide enough computing power to train and retrain an increasingly large set of AI models that contain large amounts of data for long periods of time.

said Nidhi Chappell, Microsoft’s product lead for Azure High-Performance Computing and Artificial Intelligence.

“So there was definitely a strong drive to train bigger models over a longer period of time, which means you not only need to have the biggest infrastructure, you need to be able to run them reliably for a long period of time.”

At the Build developer conference in 2020, Microsoft announced that it was building a supercomputer – in collaboration with and exclusively for OpenAI – hosted in Azure specifically designed to train AI models.

“Co-designing supercomputers with Azure has been critical to scaling our pressing AI training needs, making our search and alignment work on systems like ChatGPT Possibly.

Microsoft’s supercomputer architecture?

Microsoft has amassed tens of thousands of Nvidia A100 graphics chips to train AI models and changed the way servers are racked to prevent blackouts.

“We created a system architecture that might work and be reliable at a very large scale, and that’s what made ChatGPT possible. This is the model that came out of it.

“There will be many, many more,” Chappelle was quoted as saying by Bloomberg.

In terms of price, Scott Guthrie, Microsoft’s executive vice president who oversees cloud and artificial intelligence, said the cost of the project is “probably greater” than several hundred million dollars.

To process and train models, OpenAI also needs more than a supercomputer – a powerful cloud setup.

Microsoft is already making Azure Cloud’s AI capabilities even more powerful with new virtual machines that use Nvidia’s H100 and A100 Tensor Core GPUs along with Quantum-2’s InfiniBand network.

Microsoft will enable OpenAI and other companies — now using AI chatbots — that rely on Azure to train larger, more complex AI models.

“We saw that we would need to build special-purpose suites focused on enabling large training workloads, and OpenAI was one of the first proof-of-concepts of that.

“We worked closely with them to find out what key things they were looking for as they built their training environments and what key things they needed to do,” added Eric Boyd, Microsoft Azure AI Vice President,






Leave a Replay