Site icon Windows Mode

Microsoft Cracks AI’s Thermal Code to Boost Hyperscale Efficiency

AI processors

Key points

According to sources, Microsoft has announced a breakthrough in cooling technology for AI chips, using microfluidics to channel liquid directly inside silicon chips. This new cooling system is a significant development, as it could dramatically reshape how data centers manage the rising heat from artificial intelligence workloads. The technology uses tiny channels etched directly on the back of the silicon chip, creating grooves that allow cooling liquid to flow directly onto the chip and more efficiently remove heat.

As reported, Microsoft validated the design by cooling a server running simulated Teams meetings. The team also used AI to identify the unique heat signatures on a chip and direct the coolant with more precision. Lab-scale tests conducted by Microsoft have shown that microfluidics can perform up to three times better than cold plates at removing heat, depending on workloads and configurations involved. Additionally, microfluidics was successful in reducing the maximum temperature rise of the silicon inside a GPU by 65%.

The rise of AI workloads and high-performance computing has placed unprecedented strain on data center infrastructure. Thermal dissipation has emerged as one of the toughest bottlenecks, with traditional methods such as airflow and cold plates increasingly unable to keep pace with new generations of silicon. Sanchit Vir Gogia, CEO and chief analyst at Greyhound Research, noted that modern accelerators are throwing out thermal loads that air systems simply cannot contain, and even advanced water loops are straining.

Cooling costs are also a significant expense for data centers, with over 45%-47% of the power budget typically going into cooling. Danish Faruqui, CEO at Fab Economics, noted that the cost of cooling a data center could further expand to 65%-70% without advancements in cooling method efficiency. The thermal budget per GPU is at least doubling every year, making it imperative for hyperscalers and neocloud providers to solve thermal bottlenecks.

Microfluidics-based direct-to-silicon cooling could be the solution to this problem, limiting cooling expenses to less than 20% of the data center power budget. However, this would require significant technology development optimization around microfluidics structure size, placement, and non-laminar flow analysis in micro channels. As Brady Wang, associate director at Counterpoint Research, noted, the escalating thermal load from new generations of AI silicon means that relying on today’s solutions could impose a "hard ceiling on progress" within as little as five years.

The challenge of scaling microfluidics is universal, with hyperscalers such as AWS, Google, Meta, and Oracle all grappling with extreme chip heat as AI hardware power density soars. While microfluidics is not a new idea, making it work at scale has proven difficult for the entire industry. Manish Rawat, analyst at TechInsights, noted that fabricating micron-scale channels increases process complexity and may raise yield loss due to wafer fragility. However, if achieved, microfluidic cooling could be the sole enabler for Rubin Ultra GPU TDP budget of 3.6kW per GPU. Microsoft has teamed up with Swiss startup Corintis to use AI to help optimize a bio-inspired design to cool chips’ hot spots more efficiently. The successful development of this technology could have a significant impact on the data center industry, enabling the widespread adoption of AI and high-performance computing. With Microsoft and other industry leaders working to advance microfluidics, the future of data center cooling looks promising.

Read the rest: Source Link

Don’t forget to check our list of Cheap Windows VPS Hosting providers, How to get Windows Server 2022, Try Windows 11 Pro for Workstations & browse Windows Azure content.

Remember to like our facebook and follow us on twitter @WindowsMode.

Exit mobile version