Microsoft’s plans for a next-generation in-house AI chip have been delayed by at least six months, pushing mass production into next year, in a blow to the company’s efforts to bring more of its AI technology in-house and compete with the likes of Google and Amazon, The Information reported.
When the chip, code-named Braga, goes into production it is expected to fall well short of Nvidia’s Blackwell chip, released in late 2024, the report said, citing three unnamed people involved in the chip project.
Microsoft had reportedly planned to deploy Braga chips in its data centres this year, reducing its reliance on expensive Nvidia processors.
Design changes
The chip was reportedly delayed due to unanticipated design changes, staffing constraints and high staff turnover, according to the people cited in the report.
Microsoft has been working on an in-house AI chip since 2019 and in 2023 announced an initial effort called Maia 100, a 128-core ARM-based chip, with initial plans to deploy it in its data centres in 2024.
To date that chip has mostly been used for internal testing and is not being used to power any of Microsoft’s AI services, largely because it was designed mainly for image processing, having been designed prior to the popularity of generative AI services.
Behind the scenes the company has been designing three next-generation chips called Braga, Braga-R and Clea for deployment in 2025, 2026 and 2027, the report said.
All three of the chips are reportedly intended for inference, or the provision of generative AI services after training has been completed.
Some of the design changes included modifications to include new features requested by OpenAI, in which Microsoft has heavily invested, changes which made the chip unstable during simulations and set the project back months, The Information’s sources said.
Staff turnover
Microsoft retained the same deadline in spite of this delay, putting so much pressure on the people involved in the effort that one-fifth departed, according to the report.
Google, Amazon and other large companies deploying AI services have designed their own AI accelerators to increase performance and cut costs as well as reducing their dependence on Nvidia.
OpenAI recently began using Google’s in-house Tensor Processing Unit AI chips, which it is renting for inference tasks via Google Cloud, according to a separate report by The Information last week.