
Google researchers used the Gemini large language model to analyze 5 million news articles and create a geo-tagged dataset of 2.6 million floods.
This development addresses a significant gap in weather forecasting, as flash floods are difficult to predict due to their short-lived and localized nature. The resulting dataset, named Groundsource, provides a baseline for training machine learning models in regions lacking extensive meteorological infrastructure.
Researchers used Groundsource to train a Long Short-Term Memory neural network to generate flash flood probabilities from global weather forecasts. The model now highlights risks for urban areas in 150 countries on the Flood Hub platform and shares data with emergency response agencies.
According to TechCrunch, Gila Loike, a Google Research product manager, stated this is the first time the company has used language models for this type of work.
António José Beleza, an emergency response official at the Southern African Development Community, reported that the model helped his organization respond to floods more quickly. The project was designed to function in areas without expensive weather-sensing infrastructure or extensive meteorological data records.
The model has limitations. It operates at a low resolution, identifying risk across 20-square-kilometer areas. It is not as precise as the U.S. National Weather Service’s system because it does not incorporate local radar data for real-time precipitation tracking.
Juliet Rothenberg, a program manager on Google’s Resilience team, stated that aggregating millions of reports helps rebalance the data map. She said the approach enables extrapolation to other regions with less information. The team hopes to apply this method to build datasets for other phenomena, such as heat waves and mudslides.
Marshall Moutenot, CEO of Upstream Tech, said Google’s work is part of a growing effort to assemble machine learning-ready weather data. He stated that data scarcity is a difficult challenge in geophysics. Moutenot co-founded dynamical.org, a group that curates machine learning-ready weather data for researchers.