One Possible Future for AI in Science

Organizations around the world are ramping up initiatives to take advantage of advances in AI, including the scientific community. But what will the future of science look like through the lens of AI? To paraphrase the great Yogi Bera, the future of AI-powered scientific innovation isn’t what it used to be.

It’s hard to believe that the current wave of generative AI is still less than three years old. Soon after OpenAI set the world’s imagination on fire with launch of ChatGPT in late 2022, people set about changing how we work, how we play, and how we learn. Enterprises moved quickly to stake claims for the $4 trillion that McKinsey says AI will generate annually.

Meanwhile, scientists also ramped up their own AI initiatives. If a large language model (LLM) could “learn” enough from legal textbooks to pass the bar exam, maybe an LLM could “learn” from the scientific record to be able to answer non-trivial scientific questions. And more importantly, if the LLM was built big enough and trained with enough scientific data, could it then begin to answer some of the bigger scientific questions that have so far eluded scientific reasoning?

This was one of the initial hopes of the Trillion Parameter Consortium, a global initiative of scientists from federal laboratories, research institutes, academia, and industry that was spearheaded by the Department of Energy’s Argonne National Laboratory. When ANL announced the launch of TPC in November 2023, the group’s stated goals revolved around four themes: developing AI models for science; curating data; optimizing AI libraries for exascale platforms; and developing evaluation platforms.

Some members of TPC initially hoped that, as AI models scaled up, they could begin to crack some of the bigger scientific questions. This hope was founded on observations and empirical data, as numerous examples of the day showed LLMs displaying emergent capabilities around text translation, code completion, and math.

*ChatGPT ignited the LLM revolution in late 2022*

“Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks,” researchers from Google Brain, DeepMind, Stanford University, and University of North Carolina wrote in the August 2022 paper “Emergent Abilities of Large Language Models.” The authors described the emergent capabilities of LLMs as “an unpredictable phenomenon.”

And just like that, the great big LLM gold rush was on. OpenAI released GPT-4 to much fanfare in March 2023, just four months after releasing ChatGPT. While OpenAI hasn’t confirmed the size of GPT-4, it is thought to have about 1.8 trillion parameters. That is roughly 10x bigger than GPT-3, which is thought to have about 175 billion parameters. Many in the AI community believed the pace of scaling would continue at that rate, with an 18-20 trillion parameter model possible perhaps by 2025. Beyond that, who knew? Was a 100 trillion or 1 quadrillion model in the cards?

The possibilities were enticing. If an LLM with 1.8 trillion could do cognitive work at a PhD. level, what would an LLM with 20 trillion parameters be capable of? What sort of unpredicted emergent properties would we see as the number of parameters increased? Nobody knew the answer, of course, but there was a glimmer of hope that these supersized models could do something truly remarkable in science, such as providing a path to a unified theory of physics.

However, those high hopes appear to be on the backburner now, for several reasons, not the least of which is the so-called scaling wall. It turns out that one cannot just scale up an LLM and expect to see a comparable increase in returns. There are numerous factors involved here, including a finite supply of suitable training data, as well as architectural bottlenecks in how quickly GPUs can be supplied with data. Training a 20-trillion parameter LLM would also be prohibitively expensive, with perhaps fewer than five organizations on Earth capable of it. Hallucinations have also proved to be a difficult problem to overcome.

The idea that we were within a few years of building a super-genius AI system capable of solving the world’s toughest scientific problems appears to be over. It was a worthy goal to pursue when LLMs were surprising us with unpredicted emergent capabilities for a few months in 2022 and 2023. Some very prominent technology leaders declared that the era of artificial general intelligence (AGI) was close, but it turns out that progress mostly moves in a non-linear fashion.

While AGI appears unlikely in the near term, that doesn’t diminish the potential of AI for scientific discovery–it just means there’s more work to do!

Setting a course for doing that work was the big focus at TPC’s recent annual conference TPC25. Even if massive AI models won’t give us magic answer machines in the near future, there is still an enormous potential to leverage new AI tech to bolster the pursuit of scientific inquiry.

Much of the excitement revolves around the new class of AI models dubbed reasoning models. Just as enterprises are looking to adopt reasoning models to develop autonomous or semi-autonomous AI agents, scientists are looking to reasoning models to automate their day-to-day tasks, such as creating hypothesis, setting up experiments, running experiments, and analyzing the results.

To guide the scientific community toward its AI future, TPC leaders set five major initiatives. Charles Catlett, the executive director of TPC, shared these goals during the last day of the TPC25 event in July. It includes:

Data and Shared Compute Infrastructure: This infrastructure will serve as a resource not only for storing and organizing scientific data that will be used to train and improve AI models, and also will include shared compute resources (including servers, GPUs, storage, and tools) that will be used to train foundation models and domain specific models.
Open Frontier Model: TPC has committed to developing a very large, frontier-scale AI model that is open for the community to use, as opposed to being closed as the models from OpenAI, Google, and others are. It will be trained on the shared scientific data from step 1 and use the combined computing power of the TPC community.
Open Frontier AI system: TPC will build a top-tier AI system designed specifically for scientific work. It will start with state-of-the-art proprietary models, which will be used for testing, but eventually will a host of AI reasoning models and the big open frontier model developed in step 2.
Software Infrastructure/Framework: TPC wants to build software and tools needed to run the massive AI models and systems described in steps 2 and 3. It will include various middleware and operating systems necessary for AI research, and be used for real-world research, where it will be integrated with experiments, labs, equipment, sensors, and instruments.
Driving Challenging Applications: In addition to building software and hardware, TPC has identified a handful of high-impact scientific challenges in the areas of drug discovery, climate modeling, energy research, and material science that it wants to tackle using its AI systems.

While AI as it currently exists isn’t the silver bullet for scientific discovery that some hoped it would be, it still possesses a tremendous potential to accelerate the pace of innovation. AI won’t match or surpass human ingenuity any time soon for the truly big problems, but that doesn’t mean it won’t help science. Scientists who leverage existing AI capabilities will gain an advantage by accelerating the execution of existing scientific tasks and workflows. If that, in turn, leads to a big scientific breakthrough, it still will be a breakthrough powered by human ingenuity–but with a boost from AI.