We don’t believe in bluffing or keeping things close to the vest. That’s why Gro is proud to announce the open release of our yield forecast model with performance that we believe equals or exceeds anything previously available at any price. We want to get rid of the black box mentality that’s kept such models proprietary to a few firms and secret from the many who need to use them. We are making our suite of machine learning and process models freely available on our website because we believe a transparent, fluid exchange of data and knowledge is the best way to answer the most pressing food and agricultural challenges facing us.
But this is just the start. Our modeling process not only performs well now, but is fully specified, generalizable, and constantly improving. During the growing season, forecasts are automatically updated on a weekly basis and available on Gro. Not only can you access our forecasts but you can also deconstruct and analyze the workings of each parameter used in the forecasts on a real time basis within Gro and can do so at a global scale within minutes. This was never possible before the launch of Gro.
The data used in the forecasts ranges from economic data and crop growing cycles to satellite imagery, weather data, and plant health measurements. We started with US corn, but we plan to roll out similar models for the entire world, even in areas with poor data availability, and for crops that are central to the agricultural ecosystem but currently attract little financial market interest. The forecast parameters are already available for analysis at global scale within Gro.
This will fly in the face of some long-held beliefs about how to do business. Regardless, the scientists and engineers at Gro are excited to contribute these findings to help to bring the proven benefits of cutting edge analytic methods to global agriculture.
Why This Matters
A yield model capable of achieving a high level of accuracy months before the traditional window has enormous implications for food security, trade, and the global economy. The 2012 drought in the US led to record high prices for corn and a tragic increase in related hunger. When the record 2015 El Niño caused drought from India to Ethiopia, aid organizations scrambled to deal with potentially lethal local fallout. A model that could more quickly alert the global community to potential supply shortages would have eased the pressure. Food crises are happening more frequently due to population growth coupled with climate change. Any way to reduce that harm should be shared.
That’s why Gro’s yield model is being made available to the public. There will be continuous updates to increase accuracy and a community-based approach will further refine the model or identify areas of improvement. It’s a lesson learned from the citizen science movement and the technology industry: together, we can create something greater than any of us could alone.
What’s in the Forecast
Gro has developed a suite of machine learning-based models forecasting final corn yields. Each of those models generates a forecast. We then use a meta model based on our agronomic understanding of the plants in question to intelligently choose a weighting scheme for the sub-models. A key advantage to our approach is that we can optimize our predictive process to account for developments earlier in the season. We are able to better predict not only normal years but also anomalous years that lead to bumper harvests or drastic declines in yields. Gro’s full yield model updates at least every eight days and forecasts are made at the county level and aggregated to the national level. This compares favorably to the USDA’s and others’ monthly releases. As we incorporate new parameters and new data is generated for existing variables over time, the performance of our model relentlessly improves.
Yield models are traditionally either statistical or process-based. Pure statistical models operate strictly on historical data while process models attempt to use agronomic and ecological insights to make inferences that may not have been seen in the data previously. Each method has its quality-reducing limitations, which is why Gro’s yield model incorporates both.
The model was trained using verified historical yield data from the USDA’s approximately 1,500 growing counties. Numerous input data sources—including satellite-derived values of potential and actual evaporation of water from land and transpiration from plants (evapotranspiration), land surface temperature, and crop health measurements—were aggregated and weighted in a supervised machine learning process.
The resulting model features better anomaly and trend detection earlier in the season than the USDA’s official forecast. It achieves a high degree of accuracy even in very anomalous years. Running live through the 2016 growing season, the yield model predicted final yield for 2016 within 1.40% of the final yield by the end of August (final yield numbers are reported in January). In backtesting for the period 2001-2015, Gro’s model estimated national final yield with an average error of 2.69% by mid-August. When updated with 2016 data and backtested for the period 2001-2016, Gro’s model estimated national final yield with an average error of 2.44%. We were at least as good as, and in some cases better than, currently available private models available for purchase which provide no transparency into their methods to their users, let alone the broader public.
The Next Steps
From our unique position within the global food and agriculture industry, Gro will foster collaboration between academic, commercial, and private interests. Every improvement made to the yield model will be publicly released and act as part of a continuous conversation leading to community-based innovation. The model will also always be running live within our product and all components of the model can be analyzed in real-time and deconstructed on a historical basis.
We all want to address food security, sustainability, and climate change concerns. By this public release, Gro is hoping to create a larger discussion regarding agriculture, demystify machine learning for the agricultural sector, and help people understand the role that fundamental scientific knowledge continues to play in our ability to accurately predict real-world outcomes. We sincerely hope that through honest and informed critique of our methods, we can further improve our already world-class model. This is where the greater community comes in. Tell us what we’re missing, help us to get better, and feel free to use the model for your own forecasts.
Contribute to the discussion by sending us a message at firstname.lastname@example.org.