Gro’s Black Sea and US Wheat Yield Models: An Inside Look

27 March 2019

Gro Intelligence has launched a series of new machine-learning-based yield models for winter wheat crops in the US, Russia, and Ukraine, some of the world’s most important wheat-producing regions.

Accurately forecasting crop yields has broad implications for commodities traders; food, beverage, and chemical companies; and governments and NGOs. Modeling can help predict future trends such as crop yields and production, agricultural input demand, and significant weather patterns.

Gro has developed expertise in combining geospatial and ground-based data to create effective and robust models in a wide range of subjects, geographies, and environmental conditions. Over the past few years, we have built reliable yield models for US corn and soybeans, Argentina soybeans, and India wheat.

Wheat is one of the most actively traded crops on the global market and an important ingredient in a variety of staple foods. In the United States, wheat planting and production rank third behind corn and soybeans. But US wheat acreage has declined in the face of strengthening production in Europe and the Black Sea region, which includes Russia and Ukraine. Today, Russia is the world’s top exporter of wheat and the second-largest producer, after China.

Besides models for Russia and Ukraine, we also launched a Black Sea yield model that aggregates outputs from both Russia and Ukraine. The Black Sea region is an important focus for world markets, and Black Sea futures and options contracts introduced in 2018 have been received positively and traded actively.

Each yield model Gro has developed involves a unique recipe that emphasizes different data and environmental factors in varying amounts. Depending on the part of the world being modeled, there may be more or less data available, and distinct challenges arise when large data gaps exist. In addition, different inputs are needed for different crops: What helps predict corn yields might not work for wheat. Backtesting a model to assess how it would have performed in previous years helps to confirm which factors are the most reliable predictors of a crop’s yield.

In this Weekly Insight, we describe how we developed our newest crop yield models and the particular challenges we faced in each region. Gro is constantly working to build new models to predict different phenomena, from environmental events (e.g., a global drought index) to supply models (e.g., planting intentions) to demand models. With the help of Gro, users can create their own models, using our frameworks, modeling expertise, and unique data. Users can either use our data, or add their own data as well.

Read more:

US Hard Red Winter Wheat
The Black Sea Wheat Boom

US Hard Red Winter Wheat

Winter wheat production is declining in the United States, and in 2019 winter wheat acreage in the US is projected to reach its lowest number since 1909. Corn and soybeans are generally more profitable than wheat and increasing competition from Black Sea countries (Russia and Ukraine) as well as EU countries are driving reluctance among US farmers to plant wheat. Despite this trend, the US remains a top-five wheat producer and is routinely the second-biggest exporter.

Hard red winter wheat is a high protein variety typically used as a whole grain or in whole-wheat flour, breads, and rolls. It accounts for nearly 40 percent of total US wheat production and is planted between September and October and reaped midsummer. We trained our model on three key states: Texas, Oklahoma, and Kansas, which combined account for nearly 60 percent of all hard red winter wheat production in the US. The Gro hard red winter wheat yield model started generating forecasts in late March.

Wheat consistently returns lower profits per acre than corn and soybeans as reported by the USDA Economic Research Service (left). In recent years, wheat acreage has been consistently shrinking in favor of soybeans and corn.

A first step in building a yield model is developing a crop mask, a spatial filter applied to data derived from satellites that parses out signals generated by nontarget plants. Essentially, a crop mask tells us which crops are growing where. Well-defined crop masks are instrumental in generating accurate satellite-image-based NDVI signals for any crop. For example, a winter wheat crop mask paired with NDVI data, a measure of vegetative health, excludes the signals from foliage of extraneous plants like trees, corn, or soybeans to capture only the greenness of the wheat.

The USDA National Agricultural Statistics Service, or NASS, compiles annual data sets illustrating which crops grow in 30x30m pixels. We take this crop information, aggregate it over time, and create crop masks. We then incorporated these crop masks into our models to allow for a more granular and accurate analysis of the winter wheat crop. Using historical crop masks for Texas, Oklahoma, and Kansas, we were able to aggregate NDVI of wheat at the district level. In our US hard red winter wheat model, NDVI, which is short for normalized difference vegetation index, proved to be the best predictor of yield.

Another good predictor is potential evapotranspiration (PET). Wheat is not green throughout its entire life cycle; the crop browns as it reaches maturity. Past a certain point late in the season, new NDVI and weather information do not improve model accuracy. This is why it’s important to include a suite of variables, some of which are correlated with each other, to improve model accuracy. Potential evapotranspiration is one such variable. PET measures the capacity of the atmosphere to remove water from the surface when soil water content is not a limiting factor, and provides supplemental information on growing conditions. In general, evapotranspiration measures the sum of evaporation from soil and plant transpiration and can be used as a proxy measurement of plant health.

NDVI data gathered from satellites gives valuable insight into yield projections for hard red winter wheat. This NDVI image is overlayed with a general crop mask, not specific to wheat (left). Hard red winter wheat yield distribution in the three of the top producing US states (right).

Wheat is known to be particularly sensitive to heat and water stress, so temperature and drought indicators like land surface temperature (LST) and precipitation are also logical variables to take into account in wheat yield models. Although these factors were significant predictors of yield in our US corn and soybean models, they were less effective in our US wheat models.

The Black Sea Wheat Boom

The Black Sea region’s potential to serve as a global breadbasket is finally being recognized. Spurred by liberalization of markets in the former Soviet Union, Russia has more than doubled its winter wheat output over the past five years and is now the world’s second-biggest wheat producer and its top exporter. Much of the exported crop is shipped through the Black Sea to North Africa, the region with the greatest wheat demand. Wheat production and exports in Russia and Ukraine are expected to continue growing for the foreseeable future.

In recent years, the Black Sea region (comprised mainly of Ukraine and Russia) has emerged as a major wheat production and trade center.

The Russia and Ukraine winter wheat crops are planted around the same time as in the US, that is, in September and October; these Gro yield models started running in early January. Unlike the US, where there is extensive documentation of what crops are planted on which plots of land, many other regions lack good, objective acreage data of where different crops are grown. As described above, this information is needed to construct a crop mask, in order for satellite data to estimate the crop’s health and size. We faced such a problem in creating our earlier Argentine soybean and India wheat models and solved it by estimating the crop’s distribution using the timing of its planting and growth. Given that knowledge, dated satellite data indicates which crop is planted where.

Ukraine (left) and Russia (right) yield projections for the 2019 wheat crop as of March 2019.

A similar process was used to create crop masks for our Russia and Ukraine winter wheat yield models, but with some variations. In particular, we used a combination of satellite image types. Satellite-derived radar data, which removed significant variation produced by cloud obstruction, was added for these models and used in conjunction with Landsat imaged data to construct the crop masks.

Still, satellite-data-generated crop masks are inherently less accurate than ground-truthed masks, such as those in the US. Daily measurements of precipitation, temperature, and soil moisture, as well as evapotranspiration and evapotranspiration anomaly variables, played a role to varying degrees in predicting wheat yield in the two Black Sea countries.

In both the Russia and Ukraine models, potential evapotranspiration and precipitation were deemed important variables. Maximum and minimum temperature data provided by GHCN was included in the Russia model, but not Ukraine. Soil moisture data also was a significant predictive factor for Russia, but not Ukraine.

Both evapotranspiration (left) and soil moisture (right), among other environmental variables, can capture additional variations in wheat yields.

The Ukraine yield model benefited significantly from including evapotranspiration anomalies provided by USGS FEWSNET data. A high evapotranspiration rate is generally accepted as a good indicator of crop vitality throughout most of a plant’s life cycle and positively correlates with greater yields. Evapotranspiration anomalies can be used to compare current evapotranspiration rates with historical rates.


Gro harvests the world’s most important agricultural data from a wide range of sources and restructures it to create greater value. In this way, Gro allows users to assess a broad suite of variables to determine for themselves which factors to include in their own models.

During the growing season, forecasts are automatically updated daily and are available on the Gro web app. But with Gro’s API you can explore in-depth our publicly available and licensed data and use it along with your proprietary data to improve your existing models and build new ones. Our vast amounts of data make Gro ideal for advanced machine-learning forecast models.

As Gro continues to build new predictive models, we will continue to disseminate our information and the insights we gather through these processes to allow Gro users to make better informed decisions about which data to include in their own models.

Global agriculture data at your fingertips

Want to learn more?

Request a demo

Receive our research in your inbox


Thank you for subscribing to our newsletter!

Contact sales