*October 7, 2024*
# Root Access
![[root_interface.png]]
Cale (https://vedgie.net) and I developed this at the [NASA Space App hackathon](https://www.spaceappschallenge.org/nasa-space-apps-2024/2024-local-events/san-francisco/) San Francisco event. (We won third place!!)
You can see our submission [here](https://www.spaceappschallenge.org/nasa-space-apps-2024/find-a-team/root-access/?tab=project), and code [here](https://github.com/calepayson/root-access).
## Background
A challenge faced by farmers is how to estimate the health of their crops so they can apply resources intelligently. On the whole, over-application is the name of the game because it's very bad to lose crops.
A reason for this is that there is a very poor selection of metrics to go by; they tend to be superficial, and don't necessarily indicate the underlying health of the crops. For example the NVDI (Normalized Difference Vegetation Index) amounts to how green parts of a farm are as measured from the air.
As you probably know, plants use photosynthesis to generate energy from sunlight. What you may not know is that there is a measurable signature of light emitted by photosynthesis. This light is called [SIF](https://appliedsciences.nasa.gov/our-impact/news/solar-induced-fluorescence-learn-new-approach-remote-sensing-vegetation) (Solar Induced Fluorescence).
*It's an excellent measure of plant health*. The main idea: if a plant is not emitting SIF, it is not producing energy.
When something threatens the health of the crops, like drought, or lack of nutrients, photosynthesis stops as the plants adapt to the conditions. If the plants are in distress *now*, it can take days or weeks to detect, so the norm is to over water, over spray, because the risk is unacceptable.
Decent SIF measurements would make a tangible difference, but unfortunately it's hard to come by.
## What We Did
While quality SIF data is hard to come by, there's tons of free, high quality data on measurements closely related to SIF. If we can't measure SIF directly, it's reasonable that we should be able to predict SIF, even ahead of time, based on a measurements on a myriad of other factors that are readily available.
We made a model to predict SIF values from open source data, and an accompanying interface that farmers could use. To make our prototype, we based our model on the most obvious variable: water.
We used the [SPL4SMGP](https://nsidc.org/data/spl4smgp/versions/7#anchor-documentation) dataset; it combines data from [SMAP](https://smap.jpl.nasa.gov) and the [GEOS-5](https://earthobservatory.nasa.gov/images/44246/geos-5-a-high-resolution-global-atmospheric-model) model to provide high quality estimates of moisture just below the surface of the earth.
| ![[surface.png]] | ![[root_zone.png]] |
|:----------------:|:------------------:|
| Surface moisture (5cm) | Root zone moisture (100cm) |
For our demo we trained a linear model to predict the SIF measurements from the [OCO3_L2_Lite_SIF](https://disc.gsfc.nasa.gov/datasets/OCO3_L2_Lite_SIF_10r/summary) dataset over three days of historical data. After filtering for quality (e.g. cloud coverage) we had about 10k datapoints for the model. Here's what it looks like:
| ![[measured_sif.png]] | ![[SIF.png]] |
|:---------------------:|:------------:|
| SIF over 3 days | Predicted SIF values |
The example data clearly does not cover the whole country, and earth science is complex, so we don't expect *this* model to be very good. With better data and more careful model selection, we see a lot of potential!
## How We Did it
In a nutshell,
- Pull the data
- Process individual datasets
- Combine the datasets
- Train the model
It turns out satellite data is pretty hairy and comes in lots of interesting formats. In this case HDF5 (`.h5`) and netCDF (`.nc4`). We queried and pulled data using [earthaccess](https://github.com/nsidc/earthaccess), and filtered down the downloaded granules into pandas dataframes.
Then, for each datapoint in the SIF data, we queried moisture data over three days prior to the measurement, and selected the moisture values with the nearest coordinates (accurate to ~5km).
Finally, we trained a humble scikit-learn model on the data.