How Satellite Data and Machine Learning Could Help to Develop a Key Element of Agricultural Statistical Infrastructure
This post is written by Kiersten Johnson (USAID/Bureau for Resilience and Food Security), based on extensive discussions with USAID colleagues from across the Agency, in particular drawing from work and slides produced by Jim Verdin (USAID/Food for Peace/FEWS).
An agricultural field boundary dataset is a database comprised of a comprehensive set of vector agricultural field boundaries. It allows one to define, with precision, specific pieces of landscape for looking at remote sensing data.
Such a dataset constitutes a critical piece of informational infrastructure that, when joined with satellite remote sensing data, is seen as a game-changer for enhancing food security via the development of improved crop masks, improved estimates of the planted area, and improved estimates of yield.
In particular, early warning systems need ongoing monitoring of growing conditions and productivity outcomes for estimating the availability of food and generating IPCs (integrated phase classifications) for food insecurity. If we could confidently say that we are looking at the subset of landscape that is cropped, that would be advantageous. Then if we could relate satellite remote sensing signals over that land, we could get a good sense of agricultural yields before the harvest is in to predict food shortage or not. This allows us to give an early warning if there should be a need for assistance or intervention.
This type of dataset can also contribute to other key development outcomes by helping to do the following:
Improve land use planning and development: This dataset could be used as a framework for land use planning and updating agricultural development plans, whether by the private sector, small or large landholders, or at the ministry level, to try to encourage changes that promote national development. It can support quantifying the level of land concentration and fragmentation over time. It would be an important input into agricultural censuses,and a robust sampling frame for agricultural surveys.
Better estimate total water demand: Knowing with precision where agricultural activities are ongoing or planned in relation to available water helps to balance land use with water availability.
Facilitate farmers to:
- obtain agricultural/crop insurance and verify claims
- purchase land preparation services at fair prices
- obtain a loan
- better estimate the needed quantity of seeds, fertilizer
- establish land tenure
Improve development programming: Agricultural field boundaries can be used to:
- support decisionmaking at the program planning and design stages (for example, are the crops proposed as part of an agricultural development activity appropriate for the anticipated programmatic location?),
- support program monitoring and remote supervision during implementation (especially for remote locations or non-permissive environments), and
- provide a quantitative basis for evaluating programmatic impact.
Crop masks and crop maps, optimized by the availability of a database of vector field boundaries, are a critical piece of agricultural statistical infrastructure currently in use in places like the United States; and with the use of remote sensing data, ground-referenced training and validation data, and machine learning algorithms, it may soon be possible to develop them with the countries that USAID partners with. Below are descriptions and examples of these masks, maps, and field boundaries, along with some of the special considerations to keep in mind when considering the development of an agricultural field boundary database in countries characterized by extensive smallholder farms.
Crop masks (Figure 1): A map like this is put together every month under GEOGLAM: colors are only applied in selected areas, in this case where maize is grown. These masks are at a fairly coarse resolution due to lack of data at field-scale (does not have delineations of fields). If you’re working with other indicators like 250m rainfall, etc., it’s adequate, but improved field boundary data could improve these crop masks.
Crop maps (Figure 2): This crop map was produced by classifying pixels; they have some field boundary data but they only use it for ground-truthing. USDA trains their machine-learning models using ground-truthed pixels; it is an exercise in supervisory classification. It is subject to the “mixed-pixel” problem, which brings in a certain amount of bias when trying to estimate area, therefore is not ideal. As with the crop masks, a field boundary database could improve crop maps.
Vector field boundaries (Figure 3): USDA has something they call "Common Land Units." In collaboration with county officials, they have been working together to keep the dataset up to date. With this kind of field boundary dataset, routinely updated, in combination with satellite remote sensing data, Ministries of Agriculture could characterize what is happening within each field, watch how pixels inside each field change over time (cultivated or not, healthy crops or not, changes in type of crop cultivated), and also observe how the boundaries themselves change over time (for example, the extent to which the fields consolidate, remain stable, or fragment/split).
Using these field boundaries' dataset and satellite remote sensing data, US researchers were able to identify parcels of land that were left fallow in 2012 as a consequence of drought.
Before field boundaries were available, there were just guesses as to the extent of land area left fallow, which produced less accurate estimates of impact.
This information allowed decisionmakers to better understand and anticipate economic and food production impacts of drought in advance, as well as take ameliorative action.
Figure 5 shows another example of vector field boundaries from the US, in this case, from the San Joaquin Valley map showing summer conditions.
Economists used these data to estimate probabilities of economic impacts from leaving normally productive agricultural land fallow, like declining demand for equipment and farm labor. They allocated money for food banks county by county, depending on how much fallow land existed in each county.
Most of the above examples of vector agricultural field boundary datasets were compiled using manual/visual techniques, delineated in a very labor-intensive way. When you think about trying to do this in a developing country context, with very small parcels, it’s not feasible to do it by hand. You want the computer to do as much as possible, and this is where the training of machine learning algorithms with satellite imagery becomes important: one wants to be able to train the computer to "see" fields in the satellite imagery data. You train machine learning models with validated ground-referenced data, like that which is being collected and made publicly available by organizations like the Radiant Earth Foundation.
Examples of how computers can be used to delineate agricultural field boundaries include image segmentation. Figure 6 shows the use of image segmentation to discern cropland used e-cognition, a commercial package that can be coded up for bulk processes. USGS applied e-cognition batch processing to thousands and thousands of images so it could be used for a variety of applications. The idea was to break out uniform entities on the landscape in order to correct problematic Landsat 7 data.
Computer vision is another machine learning method that is very promising for this type of application. Figure 7 shows an example of agricultural field boundaries delineated through the use of convolutional neural networks, a type of computer vision algorithm.
- The most promising method to accomplish the production of agricultural field boundary datasets as a machine learning task currently seems to be computer vision, which is viable in areas where farmers have small plots. It requires ground-referenced training data and is applied over very high-resolution images. Notably, high-resolution imagery can be challenging to access and process.
- Ground-referenced data needs to be available for “Team 1” to train the machine learning algorithms and, independently, for “Team 2” to validate the results generated by “Team 1.”
- It is critical when working with georeferenced data to ensure that privacy concerns are appropriately managed.
In an era when data processing power and satellite remote sensing imagery are more widely available than ever before, there are increasing opportunities to use these resources to make measurable differences in the food security, livelihoods, nutrition, and well-being of people around the world.