For this project I was researching the relationship between ambient air temperature and MODIS land surface temperature measurements across Oregon. Public health research around heat stress and negative health outcomes currently uses point source measurements of weather to estimate exposures. The problem with this is that it is difficult to interpolate measurements between monitoring stations and these estimates are not accurate enough to realistically depend on. Using MODIS imagery to estimate exposure would allow researchers to look retrospectively and prospectively at estimates of exposure across nearly the entire planet over a very large temporal resolution.

For this project I used 1km resolution MODIS LST imagery for every day in 2014. I used only one swath of MODIS data, which covered approximately 85% of Oregon. This was compared with more accurate temperature measurements from NOAA monitoring stations. I used 46 stations across Oregon for the comparison.

To explore this relationship I used a Python script to extract the raster values across the entire time frame at the points of the 46 NOAA stations. I then used 2 R scripts to combine the data sets, change them into long format, and do the analysis. I used a linear mixed effects model to account for the random effect of location and also to introduce a temporal relationship to the data. Using the LME model I started with NLCD value, month of measurement, time of MODIS measurement, and elevation as covariates. I then removed covariates that were not significant to refine the model and minimize the standardized residuals.

NLCD and time of MODIS measurement were not found to be significant, and were therefore removed from the model. The results from the final model are shown below:

Parameter Estimate 95% CI P-value
(Intercept) 6.240 (3.007 – 9.472) 0.0002
MODIS Value 0.313 (0.302 – 0.324) 0.0000
Elevation -0.004 (-0.005 – -0.003) 0.0000
NLCD (evgrForst) 2.272 (-1.288 – 5.832) 0.2038
NLCD (lowIntDev) -2.446 (-7.133 – 2.240) 0.2968
NLCD (scrubBrsh) 0.961 (-2.661 – 4.583) 0.5939
Month (Feb) -1.341 (-1.944 – -0.738) 0.0000
Month (Mar) -0.631 (-1.207 – -0.053) 0.0322
Month (Apr) 0.301 (-0.255 – 0.858) 0.2889
Month (May) 2.354 (1.795 – 2.912) 0.0000
Month (Jun) 3.261 (2.680 – 3.840) 0.0000
Month (Jul) 7.894 (7.298 – 8.488) 0.0000
Month (Aug) 6.790 (6.202 – 7.378) 0.0000
Month (Sept) 6.077 (5.501 – 6.654) 0.0000
Month (Oct) 4.085 (3.523 – 4.645) 0.0000
Month (Nov) -2.161 (-2.724 – -1.597) 0.0000
Month (Dec) -1.970 (-2.554 – -1.385) 0.0000

 

Finally, to test how effective this model is I plotted the average residual value for each location on a map to attempt to identify any spatial pattern. There was none obvious; however there are still some missing factors in the model.

These results were interesting and I think that this type of model shows promise for public health research. With some refinement I believe that a good approximation could be made for ambient air temp using LST imagery.

This project was a great opportunity to combine the skills I have gained in R, Python, and using spatial data. Scripting the entire process made standardization and reproducibility easier, and will allow me to continue working on this model in the future (after a good break, that is).

As stated in previous posts, the goal of my project is to explore the difference between point-source temperature recordings at NOAA monitoring stations and the land surface temperature images made available daily from the NASA MODIS satellites. As a first step in comparing these datasets, hot-spot statistics and spatial autocorrelation were used to identify any areas where the difference between the data was significantly non-random. The steps below outline this process.

 

Data Selection and Cleaning

To begin exploring the data, I selected a single day (January 1, 2015) and linked the measurements from both sources into one shapefile. The NOAA data were downloaded as a CSV file from the NOAA National Climate Data Center (http://www.ncdc.noaa.gov/). This data arrives as a mostly cleaned CSV and the only transformation required was to convert the temperature readings from tenths of a degree Celsius to degrees Celsius. The MODIS raster image was downloaded from the USGS Earth Explorer engine (http://earthexplorer.usgs.gov/) and required some work before it could be used. The raster was re-projected from the unspecified cylindrical projection used for MODIS products to WGS84 to match the point shapefile using the ‘Project Raster’ tool in ArcGIS (http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00170000007q000000).  Once re-projected, the image then had to be re-scaled and converted from Kelvin to Celsius [OutRaster = ((InRaster * 0.02) – 273.16)] before being used.

 

Joining the Data

Once both data sets were cleaned and ready to be used, the two were joined together for analysis. This was done using a custom Python script, however the ‘Extract Values to Points’ tool in ArcGIS (http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//009z0000002t000000.htm) or QGIS completes the same task. The final shapefile contained a point for each monitoring station with fields for the NOAA and MODIS values along with the difference between the two represented as the absolute value of the MODIS value minus the NOAA value (Difference = |MODIS – NOAA|). This difference figure is what is used for the analysis.

 

Hot-Spot Map in ArcGIS

The next step was to create a hot-spot map of the difference figures to identify any areas of significantly greater or less difference. This was done using the ‘Hot Spot Analysis’ tool in ArcGIS (http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000010000000). In the image below, we can see that there is an area of greater difference in central Oregon and an area of less difference near Portland.

hotspot

It is important to note that neither of these are significant, and most likely represent differences in the terrain. I suspect that the area of higher difference in central Oregon is due to the fact that fluctuations in temperature are much greater there than in the valley. Furthermore, the temperature in the area of less difference is more stable and therefore would not be as prone to error. It is important to remember that the NOAA data represent daily temperature averages, while the MODIS data represent the land surface temperature at the specific time when the image was taken. To explore this idea further, future analysis will include some variable to account for location or land-use type.

 

Spatial Autocorrelation

Finally, the Moran’s I statistic was calculated to further explore if there is any significant spatial autocorrelation in the difference measurements. This was done using the ‘Spatial Autocorrelation’ tool in ArcGIS (http://resources.arcgis.com/en/help/main/10.1/index.html#//005p0000000n000000). The output is shown below:

modis_moransi

moransi_value

This statistic showed that there is no significant spatial autocorrelation. The combination of the very low Moran’s I and the high P-value lead to the conclusion that the difference figures are randomly dispersed throughout the points.

 

There are some areas in the map that have very high difference on this one day. The next step is to explore these data over a larger time frame to see if the pattern of the difference is the same or different. I plan to download data for at least a few days from each month of 2014 and explore the spatial pattern of the difference between each data set.

I am interested in exploring the utility of MODIS Land Surface Temperature (LST) maps in creating heat stress maps for migrant farm workers in rural Oregon. Currently it is difficult to estimate heat exposure for field workers since point source data is very scattered. Monitoring stations are currently used to collect information at a large scale; however this information cannot accurately be interpolated for a large area. Often times there is only one monitoring station for a very large area, which would lead to serious issues when trying to create a continuous surface temperature map. The ability to use remotely sensed data for these heat stress models would allow researchers to more accurately assess individual exposure. This is crucial in identifying areas that need more attention or resources, and would greatly simplify the process of analyzing this data.

I would like to compare the values predicted by the MODIS LST for a given date with the temperature recorded by the National Weather Service (NWS) or another point source for temperature data. This will allow me to find the difference between the sources of information, as well as identify any patterns in the distribution of error for MODIS data. To do this, I will compare data for many dates across a variety of locations in order to identify any spatial or seasonal patterns. The main objectives of this project are as follows:

  1. Identify the magnitude of the difference between these 2 sources of data
  2. Create a regression model for comparing temperature data recorded by the NWS and MODIS LST maps for a set of given dates and locations
  3. If the difference from the point source data to the MODIS LST image is too great, explore other ways to use MODIS LST images to predict heat stress for migrant populations
  4. If necessary and/or possible, explore other remotely sensed data sources if MODIS does not work for this spatial problem

The first hurdle will be collecting all this data for the locations needed for the analysis. Also, the MODIS data and NWS data are not reported with the same timeframe (i.e. the high and low temperatures reported by the NWS may not match the average temperature recorded on the MODIS LST image). I will need to find some way to compare these values and normalize them to each other (currently my only idea is to create a function that will take the time of the high and low records from the NWS data, create an estimate of the temperature throughout the day based on these values, and compare the temperature at the specific time of the MODIS reading).  Ideally I would like to look at data for 5 sites of different terrain over 5 days from each season (20 days total, 100 data points).

Regarding my experience with the various tools for this class, I have moderate experience using R and ArcGIS, I have a strong introduction to using Python, and have a rudimentary knowledge of using ModelBuilder.