May 2015 - GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

In my last blog post, I analyzed spatial autocorrelation in the whale movement parameters swimming speed and turning angles between consecutive segments of the whale’s trajectory for a single whale. In this update, I am expanding on this analysis by analyzing over a range of distances the spatial autocorrelation in swimming speed and turning angles in the trajectories of three foraging whales in the Stellwagen Bank National Marine Sanctuary. Positive autocorrelation in either parameter would mean that, when comparing two trajectory segments, the values for this parameter are similar between the two segments, and negative autocorrelation would mean that they are not similar. A correlogram shows the values of the autocorrelation coefficient for a range of distances between the trajectory segments. Here, I am presenting results from the analysis of the whale trajectories using the R CRAN adehabitatLT package (Calenge 2011).

The correlogram below shows the Moran’s I autocorrelation coefficient for the swimming speeds of three whales. Two whales show significant autocorrelation in swimming speed over short distances (<1000 m) (p<0.05, indicated by red circles). This means that during segments of the whale’s trajectories that are within 1000 m of each other, the whales maintained similar speeds. This is not surprising because generally it does not seem likely that the whales would abruptly change their swimming speed over such short distances.

After converting the turning angles to radians, the analysis of autocorrelation in turning angles (Moran’s I) revealed that the turning angles of only one trajectory were significantly positively autocorrelated at distances of 1000 and 2000 m (p<0.05, indicated by red circles).

Next, I used the R CRAN package adehabitatLT (Calenge 2011) to calculate first-passage time (Fauchald & Tveraa 2003) as metric for the search effort along each whale’s trajectory, and used linear regression to relate first-passage time to the environmental variables water depth and seafloor slope. The image below shows the three trajectories (in turquoise: 195b, purple: 188b_f, red: 188a) on a slope chart of the Stellwagen Bank National Marine Sanctuary area (USGS/NOAA).

Basend on the description by Fauchald & Tveraa (2003), for each trajectory, first-passage time quantifys the spatial scale of the animal’s foraging effort. The values of first-passage time at this spatial scale distinguish areas with high foraging effort (long first-passage time) from areas with low foraging effort (short first-passage time). The color-coded figure below shows first-passage time for whale 188b_f relative to seafloor slope (red: long first-passage time, green: short first-passage time).

Simple linear regression revealed that depth explained 17.2% of the variance in first-passage time for the trajectory of whale 188a (p=0.001). Separately, depth and slope explained 14.2 and 29.2 %, respectively, of the variance in first-passage time for the trajectory of whale 188b_f (each p<0.0005) (see figures below). For the trajectory of whale 195b, neither depth nor slope were significant predictors of first-passage time (each p>0.2).

Some authors (see Calenge 2011 for details) have suggested the analysis of autocorrelation of movement parameters of an animal’s trajectory following the standardization of the segment lengths. I will investigate this method in a follow-up analysis. Furthermore, I will re-analyze autocorrelation in turning angles using the absolute values of the turning angles instead of radians to facilitate the interpretation of the results.

Literature cited:

Calenge, C. 2011. “Analysis of Animal Movements in R: The adehabitatLT Package.” Saint Benoist, Auffargis, France: Office Nationale de La Chasse. http://cran.gis-lab.info/web/packages/adehabitatLT/vignettes/adehabitatLT.pdf.

Fauchald, P. & T. Tveraa. 2003. “Using First-Passage Time in the Analysis of Area-Restricted Search and Habitat Selection.” Ecology 84 (2): 282–88.

My goal over the last few weeks has been to determine the relationship between red-tailed hawk residency and environmental variables. I realized though that before I could do so, there were some data quality issues that I needed to address. Specifically, I realized that the 0 values in my residency raster (a ratio of the number of days red-tailed hawks were observed to the number of days any other species were observed) represented locations where there weren’t any hawks. Since only the locations where hawks were observed are relevant to my analyses, I removed records with a ratio of 0.

In removing these 0 values from my data (which cut the data roughly in half), I realized that they probably had a significant influence on the hotspot analyses I ran in the first few weeks of the class. I re-ran the hotspot analysis, and as expected, the hotspots were much more finely articulated. I decided to see what influence other portions of the data might have on the hotspot analysis so I tried iterations with <100%(meaning a hawk was seen on every day that any bird was seen), 0< and <100%, and 0< and <50%. The latter two produce nearly identical hotspot maps so it seemed appropriate to use all data 0< and <100%.

Hotspot analyses from upper right, clockwise: all data, <100%, 0< and <100%, and 0%<.

Next I prepared my environmental variable data. For my regression model, I used 8 variables:

Population – value of containing census tract from US Census data
Average precipitation – value of cell from 2014 PRISM data
Minimum Temperature – ibid
Percent open space with 1km radius – reclassified NLCD data (Herbaceous Upland, Grasslands/Herbaceous, Planted/Cultivated, Pasture/Hay, Row Crops. Small Grains, Fallow, Urban/Recreational Grasses = 1, everything else 0) > Focal statistics mean
Dominant land cover in 500m radius– focal statistics majority on NLCD
Avg percent canopy cover in 500m radius – focal statistics mean
Avg percent impervious surface in 500m radius – ibid
Land cover diversity in 500m radius – focal statistics variety

I then ran the Ordinary Least Squares regression tool. My R-squared value was .214. From the report the tool produced, I concluded that the residuals were not randomly distributed.

To be sure, I also ran the Spatial Autocorrelation tool on the residuals, and there is a 1% chance that the distribution could be random. I also ran a hotspot analysis on the residuals at two scales, 1,000 ft (the minimum distance band that wouldn’t produce an error) and 47,891 ft (the calculated distance band from my previous analyses). While the 1,000 ft distance band did not produce anything interpretable, the 47,891 ft distance band may point to flaws in model design. That is, the distinct locations where the model over-predicted and under-predicted may suggest other environmental variables I should include or ones I should modify/drop from my model. I haven’t figured out what these are yet though.

Hotspot analyses on residuals at 1,000 ft (left) and 47,891 ft (right).

The Koenker (BP) Statistic indicated that my model is heterscedastic (i.e. the model is not evenly fit for high and low dependent variable values). To try to understand why, I re-ran the OLS tool on a subset of my data where the residency ratio < 5% and another subset where residency > %50. Both of these subsets totaled about 3500 records each. The R-square value for > 50% was .29 and .14 for < 5%. The differences in the histograms are also telling.

Screen Shot 2015-05-12 at 9.15.20 PM — < 5%, population, avg precip, min temperature, percent open space/km, dominant land cover

> 50%, population, avg precip, min temperature, percent open space/km, dominant land cover

Screen Shot 2015-05-12 at 9.16.10 PM — < 5%, avg percent canopy per 500m, avg percent impervious surface per 500m, land cover diversity

Screen Shot 2015-05-12 at 9.48.43 PM — > 50%, avg percent canopy per 500m, avg percent impervious surface per 500m, land cover diversity

From these plots, I will try to develop other environmental variables that may be better predictors of residency.

Since my last update I’ve made significant progress in estimating the foraging ranges and overlap between Adelie and Gentoo penguins at Palmer Station over the 2014/15 breeding season.

With the help of a classmate (thanks Steven!) and a few online forums (GIS in Ecology & GIS 4 Geomorphology), I was able to figure out how to calculate kernel density estimates (KDE) without Arc’s outdated Animal Movement Extension or Hawth’s Analysis Tools.

Objective: Quantify the geographical extent of the distribution of Adelie and Gentoo penguins foraging around Palmer Station

Create kernel density estimates to identify areas used for foraging (95% KDE) and core use areas (50% KDE)
Calculate the area (km²) within 95% and 50% kernel density contours
Calculate the % overlap between the ranges of Adelie and Gentoo penguins

Methods:

Filter data points whose estimated error is >1500m
Combine location data points for all Adelie (ADPE) individuals n=15 (522 data points) and all Gentoo individuals (GEPE) n=5 (147 data points)
Create kernel density estimates using the kernel density tool and default parameters
Extract values by points from the output obtained above, determine 50% and 95% of observations using values of extracted points from attribute tables
Reclassify kernel density raster so values >50^th percentile have a new value of 50 and all others have a new value of NoData, use the same steps to create additional rasters representing 95% of points
Convert rasters to polygons, calculate area of each polygon using calculate geometry tool
Use union function to determine area of overlap between polygons

Results:

Table 1. Estimates of core use (50% KDE) and total (95% KDE) foraging areas used by Adelie and Gentoo penguins with associated overlap between species.

Figure 1. Visual representation of Adelie core use (red) and total foraging area (pink) and Gentoo core use (dark blue) and total (light blue) foraging areas. Despite poor image quality it is obvious that these ranges are closely associated with the colonies that the respective species are from, and there appears to be some association with bathymetry as the range of Gentoo’s is dense at the head of Palmer deep canyon.

Figure 2. Close up visual representation of Adelie core use (red) and total foraging area (pink) and Gentoo core use (dark blue) and total (light blue) foraging areas. Despite poor image quality it is obvious that these ranges are closely associated with the colonies that the respective species are from, and there appears to be some association with bathymetry as the range of Gentoo’s is clustered at the head of Palmer deep canyon. Note overlap between species.

Discussion:

The results of this analysis indicate that Gentoo penguins occupy a larger foraging range (core use and overall) and because of this, the portion of their range that overlaps with that of the Adelie penguins is minimal to moderate. The opposite is seen in Adelie penguins, who appear to have a smaller foraging range and thus a higher proportion of it overlaps with Gentoo penguins. Also notable is the fact that Gentoo penguins appear to be foraging farther away from their colony than Adelie penguins, which is surprising as the opposite is usually true. The main caveat of these results is the difference in sample size between data points of Gentoo (n=147) and Adelie (n=522) penguins. This was not accounted for in this analysis and is likely skewing these results. The fact that Gentoo’s have a larger range could be because there were fewer data points used in the creation of the KDEs.

The next step in this process will be to research methods that take sample size into account. One possibility is taking a random sample of Adelie location points from the total sample so that Adelie’s are represented equally to Gentoo penguins.

I will also be experimenting with KDE in R. This will allow me to compare results between the two methods (and R should speed this process up down the road)!

I am also in the process of determining whether a bathymetric layer and/or accurate basemap exists for this region. So far I’ve had difficulty finding these things but they would be very useful to compare these results to co-variates such as bathymetry and distance to shore.

I conducted a hot spot analysis for the cackling goose use of the Willamette Valley. As previously mentioned, I was curious how the geese are using the area throughout the winter season (October – April). I conducted a hot spot analysis for for each month using location points from the entire time series, 1997-2011, to attempt to discern any changes in landscape level use of the valley throughout the winter. All the maps passed the common sense test (the clusters were right over the refuges, no floating hot spots, etc.) which was somewhat heartening.

My first step was separating the data into month files and creating maps of each month in Arc. I also had not seen the data before, so it was nice to pull the points into Arc finally.

Cackling goose flock locations throughout the Willamette Valley, Oregon from 1997-2011.

Secondly I ran hot spot analyses on each month and compared. The hot spots were centered on the four federal refuges in the valley, Finley, Ankney, Baskett, and Sauvie Island (from south to north). Most of the winter months looked more or less the same, except the beginning and end of the winter season.

October vs. November

March vs. April

The addition of two hot spots between October and November, and the loss of one (Sauvie Island) in April likely reflects what sort of agriculture timing the refuges are on, but I am working on exploring why these particular locations are/are not used in those months.

GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Just another blogs.oregonstate.edu site

Monthly Archives: May 2015

Update: Autocorrelation in humpback whale movement parameters, relationship of foraging effort with environmental variables

Regression Analysis of Red-tailed hawk residency

Estimating the foraging ranges of Adelie and Gentoo penguins: update

Cackling Goose Hot Spot Analysis

Contact Info