August 16, 2018

With the US growing season in full swing, a range of private companies and government organizations are using various data and modelling techniques to predict the outcome of the harvest.

At TellusLabs, our crop yield models are made from a variety of “ingredients” - primarily reflectance data from satellites and weather information from observatories. We use a hybrid of remote sensing expertise and machine learning to determine how to incorporate each ingredient.

The USDA yield figures take a set of ingredients into account too - albeit differently: their estimates are predominantly based on in-season surveys of farmers themselves. Surveys can be a powerful source of data though they are particularly susceptible to sample bias.


The smallest unit of analysis the USDA uses for its yield estimates and forecasts is a US county. Within each state, there have always been a few counties which fall below the threshold of individual reporting by the USDA. Some counties do not provide enough data, likely due to decreased participation, or that the USDA is not able to send representatives to assess crop health and potential yield. These counties are combined into an “Other Counties” group.

Over the past fifteen years, the percentage of counties being put into this group has grown from approximately 1% in 2003-2006 up to 10% in 2014-2017.


The amount of unreported “Other Counties” has increased over the past few years.
The below GIF shows the shifting map of "Other Counties" over the past 15 years. All shaded areas are corn-growing areas, and the areas in dark blue are the ones that weren't individually tracked for the given year.
Map of unreported “Other Counties” (shown in dark blue) from 2003-2017.


This steady reduction in the number of individually-reported counties would not be a problem if they were producing average yields, (i.e. yields in line with the reported counties).  However, the types of counties that are included in the “Other Counties” category on average  report lower than average yields - creating a strong bias. This suggests that national yield models that only take into account the individually identified counties would be  biased upwards.


"Other Counties" chronically underperform the Tracked Counties


Our modeling team searches for such biases in the reporting and we correct them accordingly in our models. In this case, we correct our forecasts by adjusting the national yield estimates accordingly based on the expected direction and magnitude of the biases.

Throughout the season (and in the wake of the NASS August Report), we've had plenty of questions regarding some of the key drivers of our below-consensus view on corn. We don’t think the “other county” sample bias alone could explain such a wide difference between our current view and market consensus (in the current season, our correction for corn is to adjust our raw forecasts down 3.9 bu/ac). However, we do suspect that it is  one of the elements in play.

We think that the identification and remediation of data biases are pretty fascinating - to us, they're crucial parts of model construction.


If you’re interested in discussing our models further and trialling the Kernel product, please contact us.

Also in News

USDA September 2018 Report - Summary
USDA September 2018 Report - Summary

September 12, 2018

Today at noon EST, the USDA released its September Crop Production Report, which included forecasts for the end-of-season US corn and soy yield.
A Whole New World for Mapping in Kernel
A Whole New World for Mapping in Kernel

September 11, 2018

We’re proud to announce an overhaul of the mapping functionality in Kernel.
Pro Farmer and TellusLabs collaborate on 2018 Crop Tour Explorer!
Pro Farmer and TellusLabs collaborate on 2018 Crop Tour Explorer!

September 10, 2018

TellusLabs and Pro Farmer have partnered to create a Crop Tour Explorer that visualizes crop tour data.