A taxonomy of AI for sustainability sciences

"AI for Biodiversity" by Nidia Dias & Google DeepMind via betterimagesofai.org. (CC-BY 4.0).
Introduction
Artificial intelligence (AI) has made significant contributions across scientific disciplines, with growing applications to sustainability challenges. This section introduces a taxonomy of AI tools to establish the technical context for the application areas examined in this report. The methods are presented in a way that is consistent with broad data-driven research, placing particular emphasis on current applications, emerging trends, and prospective opportunities in sustainability.
The interdisciplinary nature of AI in sustainability often leads to subcomponents fitting into multiple categories, highlighting the inherent fluidity and interconnectedness of this domain. Therefore, the taxonomy presented in this chapter is structured to reflect both AI tools and the tasks they address in an intertwined manner. Rather than treating methods and applications as separate categories, our presentation is intended to provide a clear and intuitive account of key techniques and their use in sustainability. This perspective aims to provide a clearer understanding of how AI is being effectively applied in diverse sustainability challenges.
It is widely accepted that the mere definition of AI is a matter of open discussion. Different disciplines and application domains consider AI as a collection of methods with widely different levels of automation, data-processing abilities, or even perceived cognition.1,2 In this taxonomy, we consider a general but pragmatic definition of AI as the field encompassing machine learning paradigms and methods (both classic and modern),
symbolic approaches, and hybrid simulation techniques. This way, we focus on methodologies that fuse, possibly to a different extent, domain expertise with techniques for knowledge discovery to address relevant challenges involving (usually large) datasets and/or simulations.
This chapter is organized as follows. We will first detail the different categories of learning settings within machine learning (ML). We will then walk through various well-established AI methods, starting with classical ML, which relies heavily on classical statistical algorithms and is characterized by methods developed before the deep learning (DL) era. This is followed by deep learning, a methodology relying on deep neural networks. Within deep learning, we will cover focused DL, domain foundational DL, agentic LLMs, generative AI, and deep reinforcement learning. Next, we cover symbolic AI, which relies on explicitly representing knowledge through symbols, rules, logic, and ontologies. Following this is hybrid simulation AI, which is presented as an extension of any of the previous AI methods, combined with process-based simulation models.
Machine learning
AI systems are regarded as capable of high-level data processing, sometimes comparable—or in some respects superior—to that of humans. To build an AI system, a human expert could include an exhaustive list of rules so that the system knows how to operate in every possible situation. Since this is often unfeasible due to the prohibitively large number of required rules, many of which are unknown, the predominant contemporary paradigm for endowing a system with AI is ML. Formally, ML is a field in the intersection of computer science and mathematics that allows machines (or software) to learn to solve problems without being explicitly programmed to do so.3 This distinction is exemplified by the sharp contrast in methodology between the Deep Blue system4—an AI chess player that was explicitly programmed with an exhaustive list of rules and famously beat the world champion—and DeepMind’s breakthrough AlphaGo Zero5—part of the modern, more effective, and scalable paradigm of ML where the system learns and develops its own “strategies” from data and experience.
Within ML, different types of learning settings cater for a wide range of data-driven challenges. These are mainly:
- Supervised learning: where the relationship between a dependent (output) and one or more independent (input) variables is discovered based on available data. For instance, predicting the precipitation levels at a given city over the next year (regression), or determining if one of the generators in a wind farm is faulty (classification).
- Unsupervised learning: where relationships and patterns among samples in a dataset are discovered. For instance, grouping households by energy consumption patterns (clustering), or identifying the main deforestation trends from satellite climate observations (dimensionality reduction).
- Reinforcement learning: where an agent learns to make sequential decisions through trial and error to maximize long-term rewards, guided by a specific objective in a particular environment. For instance, optimizing the operation of a smart grid to balance renewable energy supply with fluctuating demand.
- Generative AI: where models learn to create new data resembling the properties of real data. For instance, the generation of synthetic climate scenarios using historical data to evaluate the impact and effectiveness of new policies.
In general, most real-world applications fall into one of these learning settings, or a combination of them, and identifying the appropriate setting is a necessary step toward successfully deploying an ML pipeline. Next, we present a subset of well-established ML models, which can be used (almost) interchangeably across the learning settings described above.
Classical ML refers to algorithms developed before the deep learning (DL) era (see next), which mainly rely on a limited number of so-called “hand-crafted features”—that is, transformations of the data following expert knowledge to aid the solution of a learning problem. The features are used as inputs to statistical models such as support vector machines (SVMs6), decision trees and random forests,7 k-nearest neighbors,8 and linear or logistic regression. These methods are computationally inexpensive and interpretable. They are particularly well-suited for small to medium-sized datasets and when expert domain knowledge is available. As such, classical ML methods have been extensively employed in the natural sciences over the last few decades, and their use will continue, including as components in more complex AI pipelines where appropriate.9
One significant application where established classical ML techniques are employed to address sustainability challenges is environmental forecasting, exemplified by the National Oceanic and Atmospheric Administration’s (NOAA) Harmful Algal Bloom (HAB) Operational Forecast System.10 This system relies on classical ML methods, and is trained on historical data for select features, including water temperature, nutrient levels, salinity, wind conditions, and satellite-derived chlorophyll concentrations. These models then predict the probability of HAB events in specific coastal regions, such as the Gulf of Mexico, bordered by the US, Mexico, and Cuba, or Lake Erie, located on the international boundary between Canada and the US.11 Such forecasts are critical for public health advisories, fisheries management, and protecting coastal economies, directly contributing to environmental and human well-being.12
Deep learning
Deep learning (DL) is a methodology that leverages a general class of mathematical models referred to as artificial neural networks (NNs) to represent relationships among data. NNs are a collection of interconnected layers comprising elementary processing units referred to as neurons. The NN architecture is a simplified representation of the physical structure of biological neural networks found in animals’ brains, where, by processing data through successive layers, the NN extracts information relevant to the task at hand. The name deep learning reflects the incorporation of an increasingly large number of layers in the NN architecture, where, by going deeper through the stack of layers, the data processing becomes more complex.
Deep NNs are capable of identifying intricate patterns that are beyond the reach of classical ML; however, this is only possible with sufficiently large training datasets and computational resources. Where these are available, NNs can process massive and unstructured inputs, such as images, audio, video, text, and even graphs (as those representing user interactions in a social network). As such, DL has dramatically improved the state-of-the-art in speech recognition, object detection, and specialized domains such as drug discovery and genomics.13 Particular DL architectures widely used in practice are convolutional neural networks (CNNs),13 recurrent neural networks,14 autoencoders,15 and transformers.12
The most direct, and currently the most common, way to apply the power of DL in sustainability science is to define a specific task, and then train a DL model for that task using just the data that the practitioner judges are needed. We refer to this approach as focused DL. For instance, NASA’s NeMO-Net project1 employs this focused DL approach, training a CNN to analyze high-resolution satellite and airborne fluid-lensing imagery of coral reef environments. The core methodology for this model was specifically designed and trained to delineate and classify coral reef extent, composition, and health status.16 This work directly supports coral reef conservation and broader marine biodiversity sustainability.
In contrast to the narrow scope considered by focused DL, domain foundational DL addresses a more general learning setting within an application domain, where several particular (or narrow) tasks can be cast as special cases. That is, one model supports many tasks. The success of this approach builds on the increasing volume of multimodal datasets and computational resources. Foundational models are first pre-trained on massive general-task datasets. Subsequently, one of a variety of methods can be used to finetune the foundational model to enable a specific downstream task. The benefit of this approach is that, compared to focused DL, the end user can achieve the same performance with much less data (or alternatively, can achieve greater performance for the same amount of data). In addition, the user will generally need much less computing power and may not require specialist DL skills.17
The domain foundational approach is exemplified by Aurora, a model pre-trained on over one million hours of global geophysical data, including forecasts, analysis, reanalysis, and climate simulations.18 This task-agnostic pre-training allows Aurora to learn a general-purpose representation of the geophysical aspects of Earth System dynamics. Adapting Aurora for downstream tasks provided state-of-the-art performance in air quality prediction, ocean wave modeling, tropical cyclone tracking, and high-resolution weather simulation, with greatly reduced computational requirements compared to traditional methods.18 We are also witnessing the increasing emergence of foundational general vision models19 and specialist models tailored for remote sensing.20
An application that has been at the center of AI since its early years is the processing of natural (or human) language, when early statistical language models were developed to process and generate text in restricted domains. Currently, large language models (LLMs) are the de facto resource for natural language processing. These models began as domain foundation models for human language, enabling downstream tasks such as generation, summarization, and translation. Later, they were adapted to allow for repeated rounds of conversation and to control tools that execute instructions generated by the same LLM; these are known as agents or agentic LLMs. For instance, agentic LLMs can now search on the web, produce and compile code, or assess tabular data to, for example, forecast the dynamics of the Earth System.18
LLMs, therefore, have a wide variety of practical uses in sustainability by improving complex workflows involving data handling, processing, and visualization. An additional, but lesser discussed, potential use of LLMs is as core engines of analysis, prediction, and forecasting, where they could supplement, or in some cases replace, the classical ML or DL models used currently. An example of this approach is
the project PandemicLLM, which implements fine-tuned LLaMA-2 models to reformulate real-time pandemic forecasting (demonstrated with COVID-19) as a text reasoning problem.21 Diverse data streams—including textual public health policies, textual and sequential genomic surveillance information, textualized spatial data, and epidemiological time series (which are partly textualized and partly encoded by an RNN)—are integrated into structured textual prompts via an “AI–human cooperative design.”21 The composite information is then processed to predict hospitalization trends, outperforming previous models against the tests reported in the publication.21
Generative models are AI systems capable of producing new data, such as text, images, or audio, and represent a paradigm shift in both AI model architectures and the tasks they perform. In ML, generative AI is technically defined as models that learn to approximate, either explicitly or implicitly, the underlying probability distribution of a dataset to generate new, synthetic data samples that resemble the training data. For this report’s focus, however, a more practical distinction is helpful. The distinguishing feature of generative AI is its perceived ability to create, rather than just analyze, complex information. Non-generative (also known as discriminative) AI methods receive inputs that, especially in DL, can be large and information-rich, then produce outputs that are smaller and simpler than the inputs. For example, an image can be the input, and a label the output. By contrast, generative models produce outputs that are large and complex, and reproduce the characteristics of the complex data on which they have been trained. These models can generate outputs that are as complex as their inputs (e.g., video frame in, video frame out), or even more complex than their inputs (e.g., text label in, video out). At the time of writing, the overwhelming majority of generative AI models under discussion in the scientific community are powered by DL, hence why we have included them within the Deep Learning section here.
Interestingly, to work with multimodal data, many current generative AI models incorporate LLMs alongside state-of-the-art models for image generation, such as diffusion models,22 with all of these having deep NNs as building blocks. These multimodal generative systems have reached, and even exceeded, human-level performance in tasks such as coding, writing, and illustration, making them an attractive tool for professionals across domains. A catalyst for the adoption of these models by the wider, non-specialist community is their free or inexpensive availability (e.g., ChatGPT, Gemini, Claude, and Mistral).23 Potential barriers to the adoption of generative AI include their tendency to sometimes “hallucinate” (that is, provide realistic, believable, but incorrect outputs) and to sometimes produce biased outputs (reflecting biases in their training data). Thus, at the time of writing, practitioners need to put in place careful quality control systems to ensure the appropriate use of generative AI systems.
As outlined above, reinforcement learning (RL) is an ML technique where an agent learns to make sequential decisions through trial and error to maximize long-term rewards guided by a specific objective in a particular environment. Deep reinforcement learning (DRL) refers to the use of deep NNs within an RL framework. For example, traditional RL often involves the development of policy models (which generate a suggested action in a given circumstance) and/or value functions (which predict the rewards that would result from alternative actions). Both policies and value functions can be replaced with DL models, creating RL systems that can maximize rewards in more complex environments. RL and DRL address dynamic optimization problems, characterized by the need to make sequential decisions where the resulting feedback guides each choice. This makes RL/DRL a powerful tool for addressing sustainability challenges, particularly in applications focused largely on energy and transportation, as noted in the review by Zuccotto et al. (2024).24 Importantly, there are many classical optimization methods that are separate from RL and which are better suited to static decision problems, where a one-time decision ought to be made given a fixed set of information. These methods, which include, for example, linear programming, share some properties with classical ML and some properties with symbolic AI (see below).25
Among other emerging applications of DRL, Google DeepMind’s application of deep RL to reduce energy consumption in their data centers represents a notable real-world success.26 By training an RL agent on historical sensor data (temperatures, power, pump speeds, etc.), the system learned to dynamically adjust cooling system operations (e.g., fan speeds, chiller settings) to minimize energy use while maintaining safe operating temperatures. This led to significant energy savings, directly contributing to environmental sustainability by reducing the carbon footprint of large-scale computing infrastructure.
Symbolic AI/Knowledge-Based Systems
Distinct from ML, symbolic AI represents knowledge explicitly through symbols, rules, logic, and ontologies.1 Classical examples include expert systems, semantic networks, and logic programming. Symbolic AI systems can still interface with large datasets, but these datasets must first be structured and mapped to the system’s formal ontology or knowledge representation. The systems then apply explicit, human-defined rules and logical principles to the structured data (by contrast, ML models can infer implicit patterns and statistical relationships directly from raw data). Fundamental to the history of AI, symbolic methods remain highly relevant, particularly for tasks requiring transparency, reproducibility, explainability, and the direct encoding and integration of established domain expertise or complex scientific models, as seen in advanced decision support systems.
The OECD QSAR Toolbox, for example, aids in assessing chemical hazards, aiming to reduce animal testing.27 The toolbox embeds expert knowledge and regulatory guidelines as structured rules and decision workflows. This symbolic AI approach guides users through complex chemical data integration and hazard prediction, translating scientific and regulatory information into actionable assessments. It thus supports sustainable chemical management worldwide by promoting safer chemical design and protecting human and environmental health.
Hybrid Simulation AI
Simulation models are built on mechanistic understanding of dynamic systems. By analogy, civil engineers routinely use simulations of bridges and other structures, based on Newtonian physics, to simulate the response of the structure to various forces, such as wind, and iterate on the design until appropriate levels of safety have been reached. This explicit, mechanistic approach contrasts with classical ML and DL, both of which learn empirical patterns from data. A key advantage of simulation models is that, in principle, they allow for prediction and inference in previously unseen scenarios (e.g., unprecedented CO₂ levels and climate extremes). In contrast, the predictive ability of classical ML and DL is not guaranteed when used with new data that are statistically different from the training set (known as the out-of-distribution setting).
Process-based simulation (or mechanistic) methods for Earth System components include atmospheric physics models used to drive traditional weather forecasting, which are used alongside ocean physics models within general circulation models (GCMs) to predict climate change.28 Other examples include models of hydrology, ecology (e.g., global vegetation29 and atmospheric chemistry18), and a variety of economic and socioeconomic models.30
However, the predictive ability of simulation models depends on a correct understanding and encoding of the mechanistic rules. Many systems in sustainability science are characterized by large numbers of interacting processes, some of which are not understood sufficiently to enable accurate mechanistic modeling. For example, agricultural yields emerge from an interaction among plants, soils, hydrology, weather, and indeed people. Even systems that are at first glance more purely physical, such as atmospheric and oceanic dynamics, are subject to a “long tail” of additional factors that are hard to identify and specify a priori. It can therefore be valuable to combine simulation modeling with ML, in methods known as, for example, physics-informed ML, knowledge-guided ML, scientific ML, surrogate modeling (emulation), and ML-based parameter estimation, to name a few.9
To date, most hybrid simulation-AI methods have employed classical ML methods, but we are seeing the emergence of methods based on DL. GraphCast, for instance, trained a graph neural network on historical weather reanalysis data (which themselves are outputs of physics-informed models) to forecast global weather.31 The related project NeuralGCM adopted a more tightly integrated hybrid approach: DL was used to replace or augment specific computationally intensive parameterizations (e.g., for cloud processes) within the structure of an otherwise traditional GCM.32 In both cases, the hybrid physics-informed-AI approach allowed for improved accuracy at a reduced computational cost (see also the reference to Aurora earlier in this chapter).
Conclusion
We hope that the above taxonomy and examples will be helpful to those with an interest in the current, and potential future, landscape of AI as applied to the sustainability sciences. Before concluding, we add the following caveats.
First, in several places we have contrasted previous innovations with more recent ones. For example, we contrasted classical ML with the more recent development of DL, and we contrasted the original use of LLMs (focusing narrowly on human language) with the more recent development of multimodal agentic LLMs. The caveat here is that we do not mean to imply that the more recent developments will replace the previous ones. There will continue to be problems where a more traditional or classical approach is the best choice, and where moving to a more sophisticated approach is methodological overkill. It is therefore very reasonable to expect that all of the paradigms and methods we described above will continue.
Second, to illustrate our taxonomy, we sought representative, focused use cases that illustrated the particular method under discussion. For example, we used Aurora to illustrate the concept of domain foundational DL. The caveat here is that many, potentially most, solutions in sustainability science will involve combinations of the methods we listed above. For example, one might use DL to extract data from raw sensor observations, which are then passed into a hybrid simulation-AI modeling stage, producing a model that leverages RL to find an optimal decision … and all of this might be coded up with the help of an agentic LLM!
Third, AI is a rapidly evolving field. We hope that our taxonomy will remain relevant, but it is likely to become relatively less and less comprehensive through time, as new methods and paradigms appear, which themselves can be hybridized with existing methods. For example, we outlined hybrid simulation AI as an overall approach, but there are many different ways that simulations and AI can be hybridized, and it may be that some of these will come to constitute new paradigms that could sit alongside other recent paradigms, such as deep generative AI.
Fourth, we chose to focus on the fundamentals and uses of different methods, rather than their relative environmental footprint. The caveat here is that some recent methods have, at the time of writing, significantly higher environmental costs than others. For example, overall, DL methods currently tend to require more compute (and hence energy) than analogous classical ML methods; whereas, some uses of domain foundational models, which are based on DL, require very little compute at the point of use, compared to focused DL. We therefore encourage all potential users to assess the marginal environmental costs of any AI application in sustainability, adopting solutions only where the likely sustainability benefits outweigh the likely environmental footprint.
Fifth and finally, we stress that responsible uses of AI seek to empower and augment, rather than replace, human expertise, including but not limited to scientific expertise. Human creativity and expertise, and human values, are still needed to select the relevant problems to address, select the appropriate models and datasets, evaluate model predictions rigorously, and develop those predictions into policy. Carried out correctly, AI has great potential to not only increase the productivity of sustainability scientists, but also to lead to scientific conclusions and predictions that would not be possible without AI—or without humans.
Bibliography
- Russell, S. & Norvig, P. Artificial Intelligence: A Modern Approach. (Pearson Education, Harlow, United Kingdom, 2021).
- HM Government. Artificial Intelligence Playbook for the UK Government (HTML). https://www.gov.uk/government/publications/ai-playbook-for-the-uk-government/artificial-intelligence-playbook-for-the-uk-government-html (2025).
- Samuel, A. L. Some Studies in Machine Learning Using the Game of Checkers. IBM J. Res. Dev. 3, 210–229 (1959).
- Campbell, M., Hoane, A. J. & Hsu, F. Deep Blue. Artif. Intell. 134, 57–83 (2002).
- Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
- Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
- Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).
- Cover, T. & Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967).
- Reichstein, M. et al. Deep learning and process understanding for data-driven Earth system science. Nature 566, 195–204 (2019).
- NCCOS. HAB Forecasts. NCCOS - National Centers for Coastal Ocean Science - HAB Forecasts https://coastalscience.noaa.gov/science-areas/habs/hab-forecasts/ (2025).
- NCCOS. Lake Erie. NCCOS - National Centers for Coastal Ocean Science - Lake Erie Harmful Algal Bloom Forecast https://coastalscience.noaa.gov/science-areas/habs/hab-forecasts/lake-erie/ (2025).
- Vaswani, A. et al. Attention is All you Need. in Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Inc., 2017).
- LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
- Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Phys. Nonlinear Phenom. 404, 132306 (2020).
- Hinton, G. E. & Salakhutdinov, R. R. Reducing the Dimensionality of Data with Neural Networks. Science 313, 504– 507 (2006).
- Akbari Asanjan, A. et al. Learning Instrument Invariant Characteristics for Generating High-resolution Global Coral Reef Maps. in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2617–2624 (Association for Computing Machinery, New York, NY, USA, 2020). doi:10.1145/3394486.3403312.
- Bommasani, R. et al. On the Opportunities and Risks of Foundation Models. Preprint at https://doi.org/10.48550/ arXiv.2108.07258 (2022).
- Bodnar, C. et al. A foundation model for the Earth system. Nature 641, 1180–1187 (2025).
- Awais, M. et al. Foundation Models Defining a New Era in Vision: A Survey and Outlook. IEEE Trans. Pattern Anal. Mach. Intell. 47, 2245–2264 (2025).
- Xiao, A. et al. Foundation Models for Remote Sensing and Earth Observation: A survey. IEEE Geosci. Remote Sens. Mag. 2–29 (2025) doi:10.1109/MGRS.2025.3576766.
- Du, H. et al. Advancing real-time infectious disease forecasting using large language models. Comput. Sci. 5, 467–480 (2025).
- Ho, J., Jain, A. & Abbeel, P. Denoising Diffusion Probabilistic Models. in Advances in Neural Information Processing Systems 33 6840–6851 (Curran Associates, Inc., Vancouver, 2020).
- Lopez-Gomez, I. et al. Dynamical-generative downscaling of climate model ensembles. Natl. Acad. Sci. 122, e2420288122 (2025).
- Zuccotto, M., Castellini, A., Torre, D. L., Mola, L. & Farinelli, A. Reinforcement learning applications in environmental sustainability: a review. Intell. Rev. 57, 88 (2024).
- Hillier, F. S. & Lieberman, G. J. Introduction to Operations Research. (McGraw-Hill Education, Dubuque, 2021).
- Luo, J. et al. Controlling Commercial Cooling Systems Using Reinforcement Learning. Preprint at https://doi.org/10.48550/arXiv.2211.07357 (2022).
- Aljallal, M. A., Chaudhry, Q. & Price, N. R. Assessment of performance of the profilers provided in the OECD QSAR toolbox for category formation of chemicals. Sci. Rep. 14, 18330 (2024).
- Eyrin, V. et al. Chapter 3: Human Influence on the Climate System. in Climate Change 2021 – The Physical Science Basis: Working Group I Contribution to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change 423–551 (Cambridge University Press, 2023). doi:10.1017/9781009157896.
- Sitch, S. et al. Evaluation of the terrestrial carbon cycle, future plant geography and climate-carbon cycle feedbacks using five Dynamic Global Vegetation Models (DGVMs). Glob. Change Biol. 14, 2015–2039 (2008).
- Riahi, K. et al. Mitigation Pathways Compatible with Long-term Goals. in Climate Change 2022 - Mitigation of Climate Change. Contribution of Working Group III to
the Sixth Assessment Report of the Intergovernmental Panel on Climate Change 295–408 (Cambridge University Press, Cambridge, UK and New York, NY, USA, 2022). doi:10.1017/9781009157926.005. - Lam, R. et al. Learning skillful medium-range global weather forecasting. Science https://doi.org/10.1126/science. adi2336 (2023) doi:10.1126/science.adi2336.
- Kochkov, D. et al. Neural general circulation models for weather and climate. Nature 632, 1060–1066 (2024).
Explore all chapters
About the authors
Claudia van der Salm is Strategy Partner, Technology & Responsibility, at Google DeepMind.
Drew Purves is sustainability and biodiversity co-lead at Google DeepMind.
Felipe Tobar is an associate professor in Machine Learning at the Imperial College London.
Erik Zhivkoplias is a PhD candidate at Stockholm Resilience Centre.
