Enhancing Earth Observation Solutions in Agriculture with Machine Learning
This post is written by Hamed Alemohammad, Radiant Earth Foundation.
Machine learning (ML) and Earth observation (EO) are complementary technologies. While EO helps us understand natural and anthropogenic changes on the Earth, ML empowers us to analyze vast amounts of imagery and build new models for EO data that would have been very difficult if not impossible using traditional physical models a few short years ago.
The promise ML and EO hold for agriculture is immense. EO satellites capture data at a global scale, and ML techniques can use these data to map croplands at local, regional and continental levels, which provide input for farmers and policymakers alike. In particular, the ability to estimate crop yield or detect pest/disease damage during the growing season will be game-changing in addressing food insecurity problems.
Moreover, ML applied to EO can provide near real-time insights and recommendations to farmers so they can take decisive action and improve crop productivity and their livelihoods. However, if we want to maximize the use of ML and EO in agriculture effectively, fundamental challenges in this emerging field must be addressed.
In this post, we dig deeper into these challenges and discuss how Radiant Earth Foundation is addressing these issues by empowering organizations and individuals globally with open ML and EO data, technology standards, and tools.
ML and EO challenges and Radiant Earth solutions
Radiant Earth has established Radiant MLHub – the open repository of thematic geospatial training datasets and technology standards – to enhance the use of ML on EO data. Radiant MLHub allows anyone to store, register, and share their open geospatial training datasets, and all datasets are made available under the Creative Commons licenses.
For a wide range of applications, training data labels can be generated by directly annotating satellite imagery. But annotation tasks can only identify features observable in the imagery. For example, with Sentinel-2 imagery at a 10-meter spatial resolution, one cannot detect the crop type but would be able to distinguish croplands from other land cover classes. For more spatially demanding applications like agriculture, satellite imagery must be used in concert with ground-referenced information such as crop type to build a model.
Two critical issues must be addressed and resolved if the EO and data science community are to truly drive new innovation and insights to support agriculture productivity across the globe. First, lack of representative training data, and second challenges with generating new training data from ground referencing techniques. In the following, we discuss these two issues and discuss the solutions Radiant is proposing to the community.
- Geographically diverse training data
Why this matters?
ML results are only as good as the quality of the training data used and the model that is developed to conduct the analysis. The more representative and high-quality the training data, the more accurate the algorithms are at identifying patterns and anomalies. Globally representative data also enhance our ability to run ML models at larger spatial scales. Despite the importance of diverse data, there remain gaps in training data catalogs, especially for regions in the Global South.
Existing geospatial training data catalogs are skewed towards North America and Europe. These catalogs are often the basis for generating models that are then applied worldwide. Research compiled by Radiant Earth indicates that ML models built on training data from one part of the world do not transfer accurately to different regions. The results are either biased or plain wrong. Therefore, to improve the accuracy of ML models from satellite imagery worldwide, regionally curated training data is essential.
What’s the solution?
The key to geographically diverse training data is to invest in a) expanding the high-quality regional training data and b) making the data discoverable on a permanent and open repository.
The lack of geo-diverse training data guided Radiant Earth to develop Radiant MLHub which launched in December 2019. Radiant MLHub facilitates the curation and exchange of training data and tools for ML on EO. Currently, it hosts training data for major crops in Kenya, Tanzania, and Uganda as well as some other regional land cover training datasets.
- Georeferenced data
Why this matters?
Many agricultural applications require training data based on fieldwork (a.k.a. ground referencing). For example, to estimate crop yield from EO, one needs to collect large samples of yield estimates within identified fields and then use satellite imagery and ML techniques to predict crop yield in nearby and adjacent regions. The collection of ground-referenced data is time consuming and expensive. However, research centers around the world conduct fieldwork and collect data regularly. Although these data are not collected for ML purposes, with proper georeferencing, they can be used to create training datasets with minimal effort. If properly coordinated and collected there is tremendous opportunity to put this data to work in ML on EO applications.
In a recent webinar Radiant Earth hosted on the topic of collecting and sharing ground-referenced data for ML applications, Dr. Kai Sonders from CGIAR revealed that approximately 20 percent of data collected by their research centers are georeferenced. If researchers were equipped with GPS devices and trained on the methodology to make survey data compatible, these data could be an excellent source for ground-referenced EO training data.
What is the solution?
The key to increasing good quality georeferenced data is to streamline data collection. To that end, Radiant Earth in collaboration with practitioners around the world has developed a Ground Referencing Data Guide for collecting and sharing survey data for ML applications. The guide is an outcome of a year-long project that Radiant Earth conducted on ground-referenced data for agricultural applications. It includes best practices for the in-field data collection component, specifications for metadata standards, and an example of ground-referenced data in a well-structured geospatial file format (Figure 2).
The guide is an early framework for the global EO, ML, and agricultural community. Radiant Earth is appealing to the broader community to provide feedback to improve it through time.
A global community effort
Collaboration is central to coordinating the growing volume of EO data and its analysis using ML techniques. The three challenges listed above certainly require various partners across different sectors to come together and resolve some of the foundational challenges of ML applications on EO.
The availability of benchmark training data for EO is a necessity for broader adoption of ML techniques to address agricultural problems. Targeted investment is required to build geographically diverse and robust agriculture training data. Moreover, a concerted effort to coordinate and streamline data generation, curation, and cataloging is essential to democratize access to these data.
Radiant Earth Foundation believes in collaborative innovation and welcomes partnership with all organizations who are interested in tackling these challenges collectively.