Final Post 2015 Archives - GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

For this project I was researching the relationship between ambient air temperature and MODIS land surface temperature measurements across Oregon. Public health research around heat stress and negative health outcomes currently uses point source measurements of weather to estimate exposures. The problem with this is that it is difficult to interpolate measurements between monitoring stations and these estimates are not accurate enough to realistically depend on. Using MODIS imagery to estimate exposure would allow researchers to look retrospectively and prospectively at estimates of exposure across nearly the entire planet over a very large temporal resolution.

For this project I used 1km resolution MODIS LST imagery for every day in 2014. I used only one swath of MODIS data, which covered approximately 85% of Oregon. This was compared with more accurate temperature measurements from NOAA monitoring stations. I used 46 stations across Oregon for the comparison.

To explore this relationship I used a Python script to extract the raster values across the entire time frame at the points of the 46 NOAA stations. I then used 2 R scripts to combine the data sets, change them into long format, and do the analysis. I used a linear mixed effects model to account for the random effect of location and also to introduce a temporal relationship to the data. Using the LME model I started with NLCD value, month of measurement, time of MODIS measurement, and elevation as covariates. I then removed covariates that were not significant to refine the model and minimize the standardized residuals.

NLCD and time of MODIS measurement were not found to be significant, and were therefore removed from the model. The results from the final model are shown below:

Parameter	Estimate	95% CI	P-value
(Intercept)	6.240	(3.007 – 9.472)	0.0002
MODIS Value	0.313	(0.302 – 0.324)	0.0000
Elevation	-0.004	(-0.005 – -0.003)	0.0000
NLCD (evgrForst)	2.272	(-1.288 – 5.832)	0.2038
NLCD (lowIntDev)	-2.446	(-7.133 – 2.240)	0.2968
NLCD (scrubBrsh)	0.961	(-2.661 – 4.583)	0.5939
Month (Feb)	-1.341	(-1.944 – -0.738)	0.0000
Month (Mar)	-0.631	(-1.207 – -0.053)	0.0322
Month (Apr)	0.301	(-0.255 – 0.858)	0.2889
Month (May)	2.354	(1.795 – 2.912)	0.0000
Month (Jun)	3.261	(2.680 – 3.840)	0.0000
Month (Jul)	7.894	(7.298 – 8.488)	0.0000
Month (Aug)	6.790	(6.202 – 7.378)	0.0000
Month (Sept)	6.077	(5.501 – 6.654)	0.0000
Month (Oct)	4.085	(3.523 – 4.645)	0.0000
Month (Nov)	-2.161	(-2.724 – -1.597)	0.0000
Month (Dec)	-1.970	(-2.554 – -1.385)	0.0000

Finally, to test how effective this model is I plotted the average residual value for each location on a map to attempt to identify any spatial pattern. There was none obvious; however there are still some missing factors in the model.

These results were interesting and I think that this type of model shows promise for public health research. With some refinement I believe that a good approximation could be made for ambient air temp using LST imagery.

This project was a great opportunity to combine the skills I have gained in R, Python, and using spatial data. Scripting the entire process made standardization and reproducibility easier, and will allow me to continue working on this model in the future (after a good break, that is).

Research Question

My research objective was to investigate the spatial scale and locations of surface feeding sites of humpback whales (Megaptera novaeangliae) in the Stellwagen Bank National Marine Sanctuary (SBNMS) in the southern Gulf of Maine, USA, and to investigate movement parameters of the whales.

Dataset

The existing data for this analysis stems from a long-term study investigating humpback whale behavior and ecology in the southern Gulf of Maine, USA, (for more details, see Friedlaender et al. 2009). Almost every summer since 2004, whales were equipped with non-invasive tags that recorded detailed information on the underwater movement of the whales or collected video-footage of the behavior of the tagged animal and associated whales. During daylight deployments that usually lasted for up to 8 hours, focal follows were conducted from a small boat following the tagged whale, during which detailed information on the behavior of the tagged whale at the water surface was collected. Because the tags did not contain a GPS, range and bearing information on the whale were also collected at least once when the whale was observed at the surface in between consecutive dives, usually resulting in the collection of one location point every 3-5 min (temporal extent: 117-260 min), with a spatial resolution of tens or hundreds of meters (less when the whale is foraging, more when it is traveling) (extent: 10³-10⁴ m). Continuous GPS locations of the boat were automatically collected. Based on the time stamps of the range/bearing data and the boat GPS data, the GPS locations collected at or close to the time the range and bearing data was collected were identified and together this data was used to calculate the locations of the whale.

Hypotheses

Surface feeding areas (FPT radius) are larger than mean length of prey schools (139 m, Hazen et al. 2009)
Size of surface feeding areas (FPT radius) positively correlated with group size (larger groups search larger area)
Positive correlation between FPT (at FPT radius) and depth and slope (prey habitat preference)
No autocorrelation in FPT
No autocorrelation of absolute turning angles on spatial scale of surface feeding sites (unpredictable foraging movement)

Approaches

Scan focal follow data for deployments during which surface feeding was major behavior
For these deployments, use R package adehabitatLT (Calegne 2011) to implement first-passage time (FPT) metric
- Based on the location points of an animal and the time spend at/between these points, FPT radius identifies the spatial scale on which foraging effort along an animal’s path is concentrated (Fauchald & Tveraa 2003)
- The time spent within this radius of each location point is called FPT. Location points with high FPT identify important foraging areas along the animal’s path (Bailey & Thompson 2006)
Compare FPT radius for each deployment to the group size during the deployment
Map color-coded FPT (at FPT radius) for each deployment onto bathymetry and slope raster layer in ArcMap
Extract depth and slope values for each location point
Connect the location points for each deployment with lines to show path of the animal
Buffer each location point by the value of the FPT radius for the respective deployment
Dissolve buffer for better visibility of the range of foraging effort around the location points and relative to underlying depth and slope rasters
Perform linear regression of FPT against depth/slope for each deployment and calculate r2 value
Calculate autocorrelation in turning angles and first-passage time using Moran’s I coefficient of autocorrelation in the pgirmess R-package

Results

Three deployments (188a, 188b_f, 195b) were identified for which suitable location data existed and that contained a large amount of surface feeding. Because deployment 188b_f originally also contained a large amount of traveling, here, only the part of the path with surface feeding activity was included in the analysis. Table 1 summarizes the results of the analyses.

Table 1: FPT radius, group size and correlation coefficient r2 for the regression of FPT against depth and slope are shown for each of the three deployments.

There does not seem to be a fixed spatial scale of surface feeding as the values of FPT radius range between 90 and 347 m.
It is difficult to investigate a potential relationship between FPT radius and group size as the foraging groups were very dynamic with changes in group size throughout the deployments. However, the smallest FPT radius was calculated for the only group that consisted of group size 1 for some part of the deployment, and the largest FPT radius was observed for the group that had the largest group size during some part of the deployment.
Significant relationships between FPT and depth or slope were found for two of the three deployments. For deployment 188a, depth explained 17.2 % of the variability in FPT (p=0.001). For deployment 188b_f, depth explained 14.2 % (p<0.001) and slope 29.2 % (p<0.001) of the variability in FPT (Figures 1-3)
Autocorrelation in FPT was found for all three deployments at scales greater than the FPT radius for each deployment (Figure 4)
Autocorrelation in absolute turning angles was found for two of three deployments at spatial scales much greater than FPT radius.

Figures 1-3: Whale paths, FPT and FPT radius mapped on slope chart of SBNMS (USGS/SBNMS). Whale locations for each deployment (circles) are color-coded to represent high/low FPT (red/green) at FPT radius (purple buffer) and connected with lines to show temporal sequence of locations. From top to bottom: 188a, 188b_f, 195b.

Figure 4: Moran’s I autocorrelation coefficient calculating FPT autocorrelation plotted against lag distance for all deployments. Vertical lines represent FPT radius for each deployment. Red circles indicate significant autocorrelation in FPT.

Discussion and Significance

The three deployments investigated here showed a wide range of spatial scales of surface feeding. However, there is some indication that this variability could at least in part be explained by the differences in group size between the three groups, with larger groups having a greater spatial scale of surface feeding. The spatial scale of surface feeding is somewhat consistent with the mean size of prey schools. Bathymetry and bathymetric relief seem to have some influence on the locations of concentrated search effort (FPT). This is obvious from both the correlation coefficients of FPT and depth/slope as well as from the map. However, the results of this analysis are obscured by the autocorrelation in FPT. Since the deployments are opportunistic observations, no absence data was analyzed here and the sample size is small, it is unclear whether non-feeding whales also concentrate their activity on similar locations within SBNMS and whether surface feeding also occurs over other parts of SBNMS with different ranges of depth/slope. The spatial scale of autocorrelation in FPT (451-596 m) is more similar between the three whales than the spatial scale of foraging effort. This could suggest the existence of a prey searching mechanism that is similar for all three whales and works on a scale larger than the foraging mechanism captured by the FPT analysis.

A better understanding of the foraging mechanisms of the whales can help to improve management decisions aimed at mitigating ship strike and entanglement risks in the SBNMS.

Learning outcomes

Working with spatial data in R
Implementing FPT metric
Calculating spatial autocorrelation
Extracting raster values in ArcMap

References

Bailey, H. & Thompson P. (2006). “Quantitative Analysis of Bottlenose Dolphin Movement Patterns and Their Relationship with Foraging: Movement Patterns and Foraging.” Journal of Animal Ecology 75 (2): 456–65.

Calenge, C. (2011). “Analysis of Animal Movements in R: The adehabitatLT Package.” Saint Benoist, Auffargis, France: Office Nationale de La Chasse. http://cran.gis-lab.info/web/packages/adehabitatLT/vignettes/adehabitatLT.pdf.

Fauchald, P. & Tveraa T. (2003). “Using First-Passage Time in the Analysis of Area-Restricted Search and Habitat Selection.” Ecology 84 (2): 282–88.

Friedlaender, A. S., Hazen, E. L., Nowacek, D. P., Halpin, P. N., Ware, C., Weinrich, M. T., Hurst, T., Wiley, D. (2009). Diel changes in humpback whale Megaptera novaeangliae feeding behavior in response to sand lance Ammodytes spp. behavior and distribution. Marine Ecology Progress Series 395: 91–100.

Hazen E.L., Friedlaender A.S., Thompson M.A., Ware C.R., Weinrich M.T., Halpin P.N., Wiley D.N. (2009). “Fine-Scale prey aggregations and foraging ecology of humpback whales Megaptera Novaeangliae.” Marine Ecology Progress Series 395: 75–89.

Jennifer Allen and Michael Thompson calculated and provided the whale locations based on boat GPS and range and bearing data. Thank you!

The question I examined for this project is how does the frequency of reported red-tailed hawk observations from the eBird database change between 2000 and 2014 in NW Oregon? For most of this term, however, I was trying to determine patterns in red-tailed hawk residency (as defined by the ratio of days in a calendar year a red-tailed hawk was observed at a given location to the number of days any other species was observed). I attempted to do so using a multiple regression model. After multiple failed attempts to construct an accurate model, I determined that the eBird data I was using were not suited for such analyses or at least not suited to an approach reasonable for the scope of this course. By first investigating the patterns in red-tailed hawk observation frequency, I thought I might be able to figure out why the data were producing poor results and ways to mitigate these weaknesses.

The bird observations I used to examine this question are from the eBird database, an online citizen science monitoring and reporting system. For each eBird observation, an observer submits the species, the geographic location, and the date/time. These data exist at a range of spatial resolutions from several meters to several kilometers. Because I will analyze habitat colonization at the patch scale, I wrote a Python script to select observations where lat/long is reported at or above a specified precision. The extent of these data stretches beyond the continental US. Temporally, eBird data are reported with a timestamp specified to the minute, however, I analyzed observations by year. I used data from 2000 to 2014. Data exist for observations before 2000, but they are much more sparse.

For environmental covariate data, I used various land cover parameters, population data, and climate data. The land cover parameters were derived from the National Land Cover Database (NLCD) data from 2011, which exist at a 30m resolution. I also used US Census tract data from 2010 to estimate population density. Lastly I used PRISM climate data, including average minimum temperature and average precipitation. The PRISM data used in this project were produced at an 800m resolution and represent averages from 1981-2014.

I expected to find that yearly variation in red-tailed hawk frequency would be minimal. I anticipated that frequency would be highest on the edges of urban areas near large patches of open space such as agricultural land, sagebrush, urban parks, etc. However, I also expected that these effects would be tempered by spatial bias in the eBird data (i.e. high densities of observations in and near urban areas and relatively sparse observations in rural areas). This would be represented, in part, by negative relationship between canopy density to frequency and impervious surface coverage to frequency.

My analysis process to address this problem was as follows:

Create a grid of points at 500m intervals
Snap red-tailed hawk observations to the closest grid point
Count points within every cell of a 500m raster
Extract the value of the count raster and append it to the snapped observation points
Divide the count for each record by the maximum count for that year to get a relative value of frequency (values range between 0 and 1)
Extract the value of each independent variable and append it to the snapped observation points.
Select 1,000 stratified random samples from snapped observation points with a Python script
Run Ordinary Least Squares on random samples
Use coefficients from the regression equation to weight environmental variables in a Raster Calculator expression to produce a prediction of frequency for the entire study area.

The result from this process is a raster of predicted frequency values for every cell in the study area. I repeated this process for years 2000, 2005, and 2010–2014. The total number of observations for year 2000 was 172 and the total for year 2005 was 627 so all observation records were used for each. The OLS tool failed to run for year 2000 data due to multicolinearity (probably a result of the small sample size mostly from highly populated areas). For year 2005, the model performed poorly and the prediction raster was almost completely opposite of all other predictions.

For all other years, overall model performance was good with an R-squared value for years 2010-2013 >.9 and an R-squared valued of .38 for 2014. Akaike’s Information Criterion was < 500 for 2010-2014. However, the residuals for all years were not normally distributed for any of the models, so they are not necessarily reliable. Because the Raster Calculator expressions used to produce the prediction rasters are essentially the regression model equations, the following expressions indicate the strength and the nature of the relationship with each variable for each year. They also show which variables were significant for each year, as any variables that were not significant in a model were omitted from the corresponding Raster Calculator expression. (NOTE: dependent variable values ranged from 0-1 so coefficient values are proportionately small):

2010: -0.113637 + (0.109701 * “lowDev500m”) + (-0.058497* “shrub500m”) + (0.498112 * “water500m”) + (-0.00052 * “precip”) + (0.014086 * “minTemp”) + (0.095448 * “lcdivrsty_0_1”) + (0.001215 * “imperv500m”)

2011: -0.158069 + (0.169261 * “lowDev500m”) + (-0.039183 * “ag500m”) + (0.000008 * “pop”) + (-0.094665 * “highdev500m”) + (-0.035761* “shrub500m”) + (0.403930 * “water500m”) + (-0.00033 * “precip”) + (0.018631 * “minTemp”) + (0.03855 * “lcdivrsty_0_1”) + (0.002261 * “imperv500m”)

2012: -.212994 + (0.224267 * “lowDev500m”) + (-0.065403 * “ag500m”) + (0.000008 * “pop”) + (-0.172365 * “highdev500m”) + (-0.111544 * “shrub500m”) + (0.721691 * “water500m”) + (-0.00008 * “precip”) + (0.031621 * “minTemp”) + (0.117999 * “lcdivrsty_0_1”) + (0.002558 * “imperv500m”)

2013: -0.113637 + (0.109701 * “lowDev500m”) + (-0.009148 * “ag500m”) + (-0.058497 * “shrub500m”) + (0.498117 * “water500m”) + (-0.00052 * “precip”) + (0.01406 * “minTemp”) + (0.095448 * “lcdivrsty_0_1”) + (0.001768 * “can500m”) + (0.001215 * “imperv500m”)

2014: -0.116055 + (0.069739 * “lowDev500m”) + (0.118068 * “ag500m”) + (0.000056 * “pop”) + (0.092787 * “shrub500m”) + (0.020673 * “water500m”) + (-0.00028 * “precip”) + (0.089624 * “minTemp”) + (0.277722 * “lcdivrsty_0_1”) + (-0.003689 * “can500m”) + (-0.007453 * “imperv500m”)

Variable definitions:

lowDev500m – value of 1 if low intensity urban development is the dominant land cover type within 500m, 0 if not

ag500m – value of 1 if pasture or cultivated crops are the dominant land cover type within 500m, 0 if not

pop – population density from 2010 US Census tract data (calculated by total population/area)

highdev500m – value of 1 if high intensity urban development is the dominant land cover type within 500m, 0 if not

shrub500m – value of 1 if shrub is the dominant land cover type within 500m, 0 if not

water500m – value of 1 if water is the dominant land cover type within 500m, 0 if not

precip – average annual precipitation from 1981-2014

minTemp – average daily minimum temperature from 1981-2014

lcdivrsty_0_1 – value of 1 if 3 < diversity < 9, 0 otherwise

can500m – average canopy density within 500m

imperv500m – average impervious surface cover within 500m

Some expected patterns were present in all other prediction rasters. For example, predicted frequency was lowest in forested areas and high elevation areas (with low average minimum temperatures), and it was highest in low-lying open space and in areas of low urban development. At finer scales, however, the differences between years were more significant. For instance, local differences in urban development had a noticeably stronger influence for 2012 and 2014. The low values in forested areas were also less uniform for these years. Below are the prediction rasters for 2010-2014.

If the regression models can be trusted, these results suggest that any assessment of some other metric based on frequency values (e.g. species residency) is also likely to be highly variable from year to year. Additionally, the parameters used in my regression models are still not sufficient to properly explain the patterns in frequency of red-tailed hawks. A more complex model is likely necessary to account for complex spatiotemporal variation and eBird data biases.

Throughout this exploration of eBird data, I have learned a lot about the statistical tools available in ArcGIS and the limitations of some of them. I was also challenged to write a couple of Python scripts that forced me to expand my knowledge of the Pandas package for handling large data tables. While I used ModelBuilder to automate some ArcGIS processes, I don’t think I learned anything new. I did not use R at all for this project.

GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Just another blogs.oregonstate.edu site

Category Archives: Final Post 2015

MODIS Land Surface Temperature and Ambient Ait Temperature

Spatial scale and locations of surface feeding sites and movement parameters of humpback whales (Megaptera novaeangliae) in the the southern Gulf of Maine, USA

Final Project: Regression Analysis of Red-Tailed Hawk Frequency in Northwest Oregon

Contact Info