The more artificial intelligence finds its way into our everyday lives, the more important the question of its sustainability becomes. In this context, it is often about the power consumption and emissions of the data centers in which applications such as Midjourney or ChatGPT are developed and operated. Complex models and applications in particular, which are used millions of times a day, require a lot of energy to operate, which is why the overall climate balance of AI has so far been considered rather mixed.
Little attention has been paid to the water footprint so far
In addition to electricity, the data centers used for training and operation also need another resource that is no less valuable: water, which is used to cool the systems. In the debate about sustainable AI, this aspect has so far played a subordinate role. Water consumption should not be underestimated, especially in those regions where water is already an increasingly scarce resource.
This is what researchers at the University of California at Riverside write in a recent preprint paper. “The training of GPT-3 in one of Microsoft’s modern high-performance data centers in the USA could have used around 700,000 liters of water,” it says – which corresponds to the production of 320 Tesla. And for every conversation with ChatGPT, which comprises an average of between 20 and 50 questions, a good half a liter of drinking water is wasted, the researchers estimate. Since companies like OpenAI, Microsoft and Google do not provide any official information about the electricity and water consumption of their AI applications, practically all studies on this topic are based on estimates based on various key technical data.
Water consumption of Google’s LaMDA
For their study, the researchers make a fundamental distinction between water extraction, i.e. the physical extraction of water from a river, for example, and water consumption. This can be either direct, when data centers are cooled and water is lost through evaporation, or indirectly, when you include the water needed to generate electricity.
To estimate the direct and indirect water consumption of AI, the researchers developed a model that includes the location of four Google data centers in the United States, local weather data, information about the local energy mix and the average efficiency of cooling towers. Using the calculation, they want to estimate how much water the 58-day training of Google’s AI language model LaMDA could have consumed. The result: Depending on the month of launch and the data center, LaMDA could have consumed between 800,000 and 2.8 million liters of water.
The team is aware of the weaknesses of their model: “Our estimated water footprint of LaMDA serves only as a rough guide for the research community,” the paper states. A more precise calculation is only possible through insights from the developers. Nevertheless, some important conclusions could be drawn from the study. So it is advisable to train the AI preferably over the winter months or at night when it is colder and therefore less water is needed for cooling. Also, data centers should not be set up where it is already hot and water is scarce.
“The ‘when’ and ‘where’ can significantly affect the water consumption of large AI models,” says the analysis. Last but not least, the authors point to the need to increase the transparency of the water footprint of AI models: water consumption must be prioritized “as part of the joint effort to combat global water problems”.
You can read more about water in the current issue of MIT Technology Review (can be ordered from the heise shop and is available in well-stocked station bookshops). You can also read articles in the magazine online on the heise+ platform:
(jl)