Due to the USDA’s excellent data availability and quality, we at Gro decided to model US corn yield first. We applied our agronomic expertise to select the best variables on a county-by-county basis. Then we tested those variables on the 918 most relevant counties, instead of just on one national number. After removing the linear trend growth, we performed a “leave one year out” analysis, for example fitting a model to “forecast” 2011 using every year between 2000 and 2016 except 2011. As a result, our backtesting process ran 15 times on each of the counties. Some variables selected by our experts include:

  • Normalized difference vegetation index (NDVI)
  • Land surface temperature (LST)
  • Rainfall data (TRMM)
  • Crop condition surveys
  • Crop calendars
  • Acreage planted and harvested
  • Soil surveys (gSSURGO)
  • Cropland data (CDL)

We applied machine-learning/artificial intelligence techniques to the mass of chosen data to estimate NASS county yields, which are reported by the USDA each year. Then, using reported harvested acreage data, we aggregated the resulting county yields up to the state and national levels.

Running live for the first time in 2016, the resulting set of models generated results of a higher quality than the USDA’s own estimates, at earlier dates. This is particularly impressive given that the USDA is merely attempting to estimate its own final yield number.

We have made our weekly forecast and commentary during the season available publicly on this website. Gro users can access daily forecasts as well as monitor specific inputs to the model (e.g., weekly NDVI updates, daily temperature shifts). For more technical information, you can download our yield model research paper here.