The explosive growth of artificial intelligence (AI) has prompted the datacenter giants—including Google, Meta, Amazon, and Microsoft (whose datacenters run most AI software)—to start building super-sized hyperscale datacenters that require much more power—gigawatts instead of megawatts. These giant datacenters use existing semiconductor technology that challenges aging U.S. electrical grid infrastructure to meet their energy consumption needs, according to analysts.
For instance, Goldman Sachs estimates that just a single query to ChatGPT (generative pre-trained transformer chatbot) uses 10 times as much datacenter electrical energy than traditional AI functions like speech recognition, thus the rationale for more powerful hyperscale datacenters.
Today, traditional AI runs behind the scenes. For instance, natural language recognition (as when you speak to your computer) is an AI function that requires millions (for individual words) to billions (for complete sentences) of connections between virtual neurons and synapses in a “learning” neural network. Today these spoken-word learning functions are run in the background, during datacenter lulls. After learning to recognize every word in the dictionary, the neural network can be compressed into a much smaller, faster, runtime “inference engine” for real-time responses to users.
The new AI functions—called generative AI (GenAI)—use much larger learning neural networks with trillions of connections to accommodate not just the spoken words in the dictionary, like today’s speech recognition AIs, but which learn entire libraries of books (called large language models—LLMs) or vast sets of visual scenes (called vision transformers—ViTs). However, at runtime, transformers cannot be compressed into the same small, fast inference engines as happens in word recognition. The reason is that they don’t return simple words in response to your input, but instead compare your queries with trillions of examples in their gigantic neural networks and transform them—word by word—into responses that range in size from complete paragraphs to a whole white paper, or even to an entire book on the subject of your query.
By the end of the decade, even more computational power will be needed when GenAI applications progress to routinely returning entire works of art or, say, video documentaries from queries to ViTs, like “painting landscapes in the style of Vincent van Gogh,” according to Jim McGregor, founder and a principal analyst at Tirias Research.
“Once we get to mass adoption of visual-content creation with GenAI, the demand is going to be huge—we’ll need to increase datacenter performance/power exponentially,” said McGregor.
To support datacenters powerful enough to handle existing chat-caliber GenAI, Tirias’ latest report predicts U.S. datacenter energy consumption will increase from over 1.4 tera-Watt-hours (TWh) today to 67 TWh by 2028. Goldman Sachs estimates that when you add traditional AI to GenAI, about twice that amount of growth is expected in the same time period, resulting in AI consuming about 19% of overall datacenter energy power, or about 4% of total grid energy generation for all the U.S.
The way this strong growth in energy consumption from the grid will be met, according to Goldman Sachs report AI, Data Centers and the Coming US Power Demand Surge, is by transforming power generation for the grid from coal-fired electrical energy generation to “60% [natural] gas and 40% renewable sources [mainly solar and wind].” In addition, Bloomberg points out that the move to gas and renewable sources will include delaying the retirement of some coal-fired electricity generation plants nearest the newest hyperscale datacenters.
There is also a trend to prevent overloading of the grid with nuclear electrical energy generators dedicated to individual hyperscale datacenters, called small modular reactors (SMRs), said Rian Bahran, Assistant Director of the White House Office of Science and Technology Policy in his keynote at Data Center World 2024. Bahran said nuclear power should be added to the list of “clean” and “sustainable” energy sources to meet hyperscale datacenter energy consumption demands. In fact, Amazon has already purchased from Talen Energy, a nearly 1-gigaWatt-capable nuclear-powered datacenter campus in Salem, PA, powered by the adjacent 2.5-gigaWatt Susquehanna nuclear plant owned by Talen. Bahran also revealed that currently as many as two dozen SMRs are being constructed, each capable of generating bout 75 megawatts of electricity, on two datacenter campuses in Ohio and Pennsylvania.
At the same time, Microsoft is attempting to one-up fission reactors like SMRs by investing in nuclear-waste-free fusion reactors (partnering with Helion).
“No single silver bullet will solve this increasing need for more electrical energy sources, but it’s not as bad as some make it out to be, at least for the next generation beyond current technology datacenters,” said McGregor. “The way I see it, it’s like Moore’s Law [regarding the periodic doubling of transistor density]; we kept predicting its end, but every time we thought there was a roadblock, we found an innovation that got past it.”
Today’s hyperscale datacenters are using current semiconductor technologies and architectures, but innovation will stave off the unbridled increase in GenAI power consumption—in the long term—the same way innovation kept Moore’s Law moving forward, according to McGregor. That is, by finding new ways to increase performance while lowering power—with a new generation of hybrid stacks of CPUs, GPUs, and memory chips in the same package, with water-cooled server racks instead of air-cooled, with all-optical data connections—even chip-to-chip—instead of today’s mix of copper and fiber, and with larger water-cooled wafer-scale chips with trillions of transistors.
“The level of innovation in power reduction is phenomenal. This level of innovation rivals the start of the semiconductor industry and in many ways is even faster-growing. If technology stood still, then we would run out of available energy by the end of the decade,” according to McGregor. Yet according to Tirias’ GenAI predictions, the use of low-power hybrid CPU/GPU-based AI accelerators at datacenters will grow from 362,000 units today to 17.6 million in 2028.
“Take, for instance, Cerebras Systems AI chip that takes up an entire wafer with four trillion transistors,” said McGregor. The Cerebras next-generation water-cooled wafer-scale chip draws 50X less power for its four trillion transistors than today’s separate CPU-chip- and GPU-chip-based datacenter servers. The wafer-scale made-for-AI chip is currently being proven out in collaborations with researchers at Sandia National Laboratories, Lawrence Livermore National Lab, Los Alamos National Laboratory, and the National Nuclear Security Administration. It also will be integrated into future Dell servers for large-scale AI deployment.
Already available powering four of the top five positions on the 2024 Green 500 supercomputer list is the latest Nvidia hybrid CPU/GPU-based AI accelerator, which can replace multiple traditional servers for AI workloads, at a fraction of their current energy consumption. For instance, Nvidia user Pierre Spatz, head of quantitative research at Murex (Paris), reports in a blog that Nvidia’s latest AI accelerator, the Grace Hopper Superchip, is “not only the fastest processor [available today], but is also far more power-efficient—making green IT a reality.” According to Spatz, this Nvidia Grace Hopper Superchip boosts Murex’s financial-prediction software performance by 7X while simultaneously offering a 4X reduction in energy consumption.
Innovation Solving Crises
Nvidia is not the only hybrid CPU/GPU chip maker with faster AI execution at lower power. For instance, AMD won the 2022 top spot in the Green500 supercomputer ranking (and four of the top 10 slots in the Green500 2024 supercomputer ranking). AMD’s latest secret sauce for faster performance with lower energy consumption in its next-generation chips is hybrid stacking of multiple CPU, GPU, and I/O-to-optical fabric chips in the same package.
Cerebras has attached a water-cooled metal cold plate to the top of silicon chips to draw heat away more efficiently than by using cool air as in today’s datacenters.
Other chip makers also are accelerating their next-generation datacenter processors with power-saving hybrid multi-chip stacks. In addition, Intel, Samsung, and Taiwan Semiconductor Manufacturing Company (TSMC) are demonstrating 3D stacked transistors for their next-generation processors that substantially increase performance while saving power.
Semiconductor architects are also beginning to rethink the entire datacenter as a single system—like hybrid systems-on-a-chip—investing in sustainable, more energy-efficient architectures that, for instance, switch to water (instead of air) cooling for the racks in the entire datacenter. “The rear door heat exchanger, for instance, is based on water cooling that can reduce the energy consumption in the servers at high-density datacenters,” according to Laura DiDio, president and principal analyst of Information Technology Intelligence Consulting (ITIC).
Future datacenters also will make use of quick-switching strategies among multiple power sources, including solar, wind, natural gas, geothermal, grid, and nuclear reactors, said McGregor.
According to Jim Handy, general director of Objective Analysis, the popularity of AI has created an energy crisis, but not an unsolvable one.
“What is interesting to me in all these crises arguments, is that they happen over and over every time a new technology starts becoming widespread—the crisis predictors are just extrapolating from the current technologies, which doesn’t account for innovative solutions,” said Handy. “For instance, in the 1990s, the Internet began growing so fast that we had predictions that half the electrical energy of the world was going to be consumed by it. What happened? Innovation was able to keep up with demand. The same massive crisis argument happened again when bitcoin took off, but that too fizzled, and now we are hearing the same crisis arguments regarding the growth of AI.”
R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.