NVIDIA's upcoming GB200 server rack will primarily be cooled by liquid circulating in the hardware pipeline, rather than air. The rack includes NVIDIA's next-generation Blackwell chips. An NVIDIA spokesperson stated that the company is also working with suppliers to develop other cooling technologies, including immersing entire computer drawers in non-conductive liquids that can absorb and dissipate heat.
As engineers attempt to curb the massive power consumption of such devices, cooling has suddenly become a hot business. Research by Goldman Sachs shows that global data centers, which are large computers processing AI calculations, are expected to account for 8% of the total electricity demand in the United States by 2030, up from the current 3%.
With tech companies racing to deploy AI in fields such as content creation and autonomous driving, NVIDIA's GB200 series is expected to be in high demand.
Earlier this month, NVIDIA's stock price took a hit due to investors' reaction to the potential delay of the Blackwell product's market launch. Although the company stated that it would increase production in the second half of this year, Supermicro CEO Charles Liang said the schedule has been "a little delayed." Liang predicted that a large number of products will come off the production line in the first quarter of next year. Supermicro produces server racks using NVIDIA chips.
Advertisement
Data centers, which can have tens of thousands of servers, are often noisy and cold. Associate Professor Ren Shaolei of Electrical and Computer Engineering at the University of California, Riverside, said that in data centers using old facilities such as fans and air conditioning, cooling accounts for 40% of electricity consumption, which can be reduced to 10% or lower with more advanced technologies.
Liquid cooling has become a common configuration for high-end gaming computers, but on a larger scale, liquid cooling has traditionally only been used to solve the biggest problems, such as in nuclear power plants. The upfront cost of circulating liquids in precision electronic equipment may be many times the cost of installing air conditioning and fans. Moreover, some parts are in short supply.
Leakage in the cooling system is the biggest risk.
"Any drop of water on a server, such as the multi-million dollar GB200, could cause catastrophic damage," said Oliver Lien, General Manager of Forcecon Technology, a company that collaborates with semiconductor manufacturers to develop cooling systems.
A recent report by Morgan Stanley shows that more than 95% of data centers currently use air cooling technology due to its mature design and high reliability.Liang Jian said that Supermicro will use liquid cooling technology in about 30% of the racks to be delivered next year. He said that in June and July, the company delivered more than 1,000 liquid-cooled AI racks, accounting for more than 15% of the global new data center deployments.
NVIDIA manufactures its own servers and also supplies chips to other server manufacturers, who manufacture equipment for tech giants developing AI applications. Decisions on cooling technology are often made jointly by these companies.
Manufacturing Challenges
According to sources involved in the relevant plans, the contract manufacturer Foxconn, headquartered in Taiwan, China, plays a leading role in the production of NVIDIA's GB200 series in Taiwan and Mexico.
The sensitivity of the cooling issue was highlighted at the end of July when social media posts claimed that there was a leak in the cooling system of the GB200, followed by a more than 5% drop in the stock prices of Foxconn and two cooling component suppliers.
Sources familiar with the production situation said that this is a normal problem that occurs during production readiness testing, and suppliers can resolve it. They said that the cooling system issue is unlikely to have a significant impact on the delivery schedule of the GB200. The stock prices of Foxconn and the relevant suppliers quickly recovered thereafter. NVIDIA declined to comment, and Foxconn did not respond to requests for comment.
Supermicro said that its liquid cooling system reduces the power consumption of data centers by 30% to 40%. NVIDIA said that liquid-cooled data centers can accommodate twice the computing hardware in the same space because air-cooled chips require more space on servers.
Lian Chunyuan of Lizhi Technology said that if only air cooling is used, such high-performance computers would need the server room temperature to be below 50 degrees Fahrenheit (about 10 degrees Celsius). In addition to high power consumption, the dust generated by the fans will hinder performance, and the constant humming sound for 24 hours will also annoy neighbors.
"For high-end AI applications of companies like NVIDIA, AMD, or Google, liquid cooling is absolutely inevitable," said Lian Chunyuan. The sound emitted by the part of the machine with liquid cooling is very small, instead of the high-speed "whoosh" sound, and it almost does not raise dust.
"Put your hand on the machine, and you will feel a slight vibration. This gentle operation tells you that they are working hard," said Lian Chunyuan.Morgan Stanley estimates that liquid cooling systems for high-end racks of Nvidia's GB200 cost more than $80,000, about 15 to 20 times the cost of existing air-cooled systems for racks equipped with Nvidia's H100 chips. Morgan Stanley expects the market for such systems to more than double to $4.8 billion by 2027. Growing pains In these systems, pumps circulate coolant through microchannels on a condenser plate on top of the chip, pumping it out as the coolant heats up. One clear impediment to the industry's development is a shortage of specialized parts. AMD said it had to delay shipments of about $800 million worth of products due to a shortage of parts, mainly related to liquid cooling. One part in short supply is called a universal quick-disconnect fitting, which prevents leaks when a part of a piping system is disconnected, executives said. Such parts are mainly made by European and American companies, but more than half of the global cooling system business is concentrated in the hands of Taiwanese companies, said Gong Yuzhun, head of Intel's liquid cooling project and president of the Taiwan Thermal Management Association. Taiwanese companies are benefiting from experience cooling gaming PCs, just as Nvidia made many chips for the gaming industry before moving into AI. Many in the industry believe the next step could be full immersion in heat-absorbing liquids, though the technology faces some skepticism because liquids and custom tanks are expensive and more difficult to maintain. Taiwanese companies including Cooler Master, a longtime Nvidia partner whose high-end computer cooling hardware is popular with video game enthusiasts, are developing immersion cooling technology for potential future products from Nvidia, according to people familiar with the matter. Last year, Nvidia CEO Jensen Huang stopped at a trade show to watch Taiwan's Gigabyte Technology demonstrate its immersion cooling equipment. "Good job," Huang said at the show. "This is the future."
Comments