Introduction & Background

Despite the growing number of scientists, federal and state agencies, private citizens, and non-profit organizations working to restore damaged ecosystems in the Great Basin, intact native plant communities continue to decline. The shift away from native-perennial to invasive annual-grass dominated systems has reduced biodiversity, increased wildfire severity and frequency, and has expedited desertification.

To combat this ecosystem overhaul, the most up-to-date and relevant science must be used to guide the restoration of Great Basin plant communities. Improving native plant establishment rates in the restoration setting is one of the biggest challenges faced by land managers. Bluebunch wheatgrass (Pseudoroegneria spicata) is a commonly used native species in restoration but seedling establishment is modest. The goal of our study is to fill knowledge gaps surrounding seedling adaptation to  soils.

The information from a study by St. Clair et al. (2013) was used to delineate seed transfer zones for bluebunch wheatgrass. A seed zone is a geographic area across which native seed can be collected and planted without risking maladaptation. To delineate seed zones, seed is collected from wild populations across the geographic range of a species. Plants are grown to adulthood in several common garden locations that span this geographic range. Climatic data are matched with phenotypes. Each phenotype is defined by observed adult plant traits measured in each common garden (e.g. leaf width, leaf length, pubescence, crown width). Phenotype and climate data are mapped together using a spatial model (Westfall 1992). The final step in creating a seed zone is to empirically verify whether using delineated seed zone maps to guide plant material selection actually enhances plant fitness.

An important knowledge gap I have identified is the role of soils in the adaptation of bluebunch wheatgrass. There exists only a moderate correlation between population-level phenotypic trait variability and seed-source climates, even though traits associated with size, phenology, and leaf morphology varied considerably among the populations sampled (St. Clair et al 2013). Adaptive phenotypic traits are visible expressions of genetic variation caused by environmental factors (i.e. leaf length, flowering phenology, pubescence etc.). This moderate correlation between phenotypic traits and climate may suggest that other factors contribute significantly to local adaptation in bluebunch wheatgrass.

Soils are natural bodies that consist of living organisms, organic matter, minerals, air, water, are comprised of horizons, and have the ability to support plant life (NRCS 2016). The complex array of factors inherent in soils makes generalizing about them challenging. Many studies link plant survival to soil texture class (i.e. percentage sand, silt, and clay) and soil-water dynamics (Letey 1958, Ullah and Hulbert 1969). Others such as Jensen et al. (1990) used soil traits and discriminant analysis to predict sagebrush-dominated plant community types using soil traits such as soil depth, subsoil clay content, total water holding capacity, and A-horizon thickness. Still other studies correlate seedling emergence and germination to aggregate size and bulk density (Nasr and Selles 1995). I aim to determine whether observed phenotypic traits of bluebunch wheatgrass vary with soil traits such as soil texture, soil depth, pH, aggregate structure and water holding capacity.

My study will utilize existing phenotypic trait data and soils maps to explore links between soil order, soil series, plant traits, seed zones, and ecoregions. I predict that phenotypic trait divergence will be correlated to soil gradients that exist across seed zones. Information obtained from this work will either support current seed zones for bluebunch wheatgrass or create better seed zones, and help land managers to achieve higher success rates in restoration.

Description of Datasets

  1. The seed zones for bluebunch wheatgrass have been delineated into shapefiles. Each polygon represents a seed zone. The boundaries of these polygons were delineated using ArcMap spatial analysis of climate (precipitation / temperature), elevation, and plant traits (leaf length, leaf width, crown width etc.). This dataset / map was developed in 2013 and is based on two years of data collection at common-garden sites in the Great Basin.
  2. The Phenotypic trait dataset contains measurements of individual plants at sixteen different common gardens spread throughout the study area. Each seed zone is represented by two common gardens, and each common garden contains 4-5 populations from each seed zone. Measurements such as leaf length, width, and reproduction stage scores were gathered from each individual plant in all sixteen gardens. Each population of bluebunch wheatgrass growing in the common gardens is associated with a collection site (UTM / elevation / approximate area in acres). Each common garden also has an associated (UTM, elevation / and dimensions in meters).

SeedZonesMap

  1. “The gSSURGO (soils) database is derived from the official Soil Survey Geographic (SSURGO) database. SSURGO generally has the most detailed level of soil geographic data developed by the National Cooperative Soil Survey (NCSS) in accordance with NCSS mapping standards. The tabular data represent the soil attributes and are derived from properties and characteristics stored in the National Soil Information System (NASIS). The gSSURGO data were prepared by merging the traditional vector-based SSURGO digital map data and tabular data into statewide extents, adding a statewide gridded map layer derived from the vector layer, and adding a new value-added look up table (valu) containing “ready to map” attributes. The gridded map layer is in an ArcGIS file geodatabase in raster format. The raster and vector map data have a statewide extent. The raster map data have a 10-meter cell size that approximates the vector polygons in an Albers Equal Area projection. Each cell (and polygon) is linked to a map unit identifier called the map unit key. A unique map unit key is used to link the raster cells and polygons to attribute tables” (Soil survey staff 2015).

Hypotheses

I hypothesize that some phenotypic trait variability in bluebunch wheatgrass can be explained by adaptation to certain soil traits. My objective is to determine if phenotypic trait variability in bluebunch wheatgrass corresponds to soil order, soil series, or other distinct soil trait groupings, and if the existing seed zones for bluebunch wheatgrass and/or ecoregion account for the spatial distribution of these soil trait groups.

Approaches

  1. Soil maps containing soil order polygons will be compared to existing bluebunch wheatgrass seed zone polygons to determine the extent to which soil order is contained within existing seed zones.
  2. Plant trait data from all 16 common gardens will be compared to the soil traits (i.e. order, series, texture class or other) to look for correlations between source population soils and genetic expressions in bluebunch wheatgrass.
  3. Soil maps containing soil series polygons will be compared to existing bluebunch wheatgrass seed zone polygons.
  4. Available soil maps containing soil order and series will be compiled for our study area. Soil series and pedon descriptions will be used to assemble a set of soil traits that are relevant to plant growth and that exist in the study area. These traits will include, depth to bedrock, A and B horizon texture classes, percent gravel, pH, aggregate structure, and depth to impermeable layers. These data will be organized into a database for statistical analysis in PC-Ord (a multivariate statistical analysis software). Phenotypic trait data (such as leaf length, basal width, and seed production) will be gathered from a previous study involving sixty bluebunch wheatgrass populations grown in sixteen common gardens spread across the study area (figure 1).

Expected Outcomes

I expect to find that bluebunch wheatgrass phenotypes are correlated with soil order and may be correlated with other soil traits such as texture and aggregate structure.

Significance

A relationship between soil and bluebunch wheatgrass phenotypic expression suggests that soils information should factor into the seed-zone delineation process.

Level of preparation

I have moderate experience with ArcMap but have never used the program to do spatial analysis. Similarly, I have two terms of introductory statistics that used R but have never explored multivariate datasets.

References:

Predeville, Holly. 2016. Depiction of data gathered in a study by St. Clair et al. 2013 in google maps. https://www.google.com/maps/d/edit?mid=zxFzk9yulKc0.klEzA-cxckLY

Jensen, M. E., G. H. Simonson, and M. Dosskey. 1990. “Correlation between Soils and Sagebrush-Dominated Plant Communities of Northeastern Nevada.” Soil Science Society of America Journal 54 (3): 902. doi:10.2136/sssaj1990.03615995005400030049x.

Letey, J. 1958. “Relationship between Soil Physical Properties and Crop Production.” In Advances in Soil Science, edited by B. A. Stewart, 277–94. Advances in Soil Science 1. Springer New York. http://link.springer.com.ezproxy.proxy.library.oregonstate.edu/chapter/10.1007/978-1- 4612-5046-3_8.

Nasr, H. M., and F. Selles. 1995. “Seedling Emergence as Influenced by Aggregate Size, Bulk Density, and Penetration Resistance of the Seedbed.” Soil and Tillage Research 34 (1): 61–76. doi:10.1016/0167-1987(94)00451- J.

NRCS 2016. What Is Soil? | NRCS Soils. 2016. Accessed March 3. http://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/edu/?cid=nrcs142p2_054280.

Soil Survey Staff. Gridded Soil Survey Geographic (gSSURGO) Database for the Conterminous United States. United States Department of Agriculture, Natural Resources Conservation Service. Available online at https://gdg.sc.egov.usda.gov/. November 16, 2015 (FY2016 official release).

St. Clair Bradley John, Francis F. Kilkenny, Richard C. Johnson, Nancy L. Shaw, and George Weaver. 2013. “Genetic Variation in Adaptive Traits and Seed Transfer Zones for Pseudoroegneria Spicata (bluebunch Wheatgrass) in the Northwestern United States.” Evolutionary Applications 6 (6): 933–48. doi:10.1111/eva.12077.

Ullah Alizai Hamid, and Lloyd Hulbert. 1969. “Effects of soil texture on evaporative loss and available water”, Soil Science.

Westfall, R. D. 1992. “Developing Seed Transfer Zones.” In Handbook of Quantitative Forest Genetics, edited by Lauren Fins, Sharon T. Friedman, and Janet V. Brotschol, 313–98. Forestry Sciences 39. Springer Netherlands. http://link.springer.com.ezproxy.proxy.library.oregonstate.edu/chapter/10.1007/978-94-015-7987-2_9.

Research Question
Are magnetic properties such as susceptibility related to heavy metal concentrations at the Formosa Mine Superfund Site? What sort of environmental factors contribute to the distribution of heavy metals in the affected area?
Data Set
The data consist of heavy metal concentrations, obtained with a portable X-Ray fluorescence gun (pXRF) as well as magnetic susceptibility at both low and high frequencies (0.4 and 4.7mT). Samples were taken semi-randomly from the Formosa Superfund Mine site and nearby streams, Middle Creek and Cow Creek. Samples were taken from areas that were accessible but also at random along trails and near site features such as the adit diversion system and identifiable seeps and drainage routes along roadsides at the site. Samples were divided into 4 fractions; bulk, >63µm, between 20-63 µm and <20 µm. These samples were then dried in an oven at 40C and then prepared for pXRF and magnetics measurements.
Hypothesis
Magnetic properties will show correlation with heavy metal concentrations because heavy metals tend to associate with magnetic minerals. Various environmental factors related to generation of acid rock drainage and hydrology will account for the distribution of certain metals in the affected area.
Past research has shown strong correlation between certain magnetic parameters and heavy metal concentrations. Magnetic techniques have been used to identify and delineate polluted areas in a number of applications from mapping atmospheric deposition of fly ashes to determining sources of contamination in urban environments (Lu and Bai, 2006). Furthermore, a Pollution Loading Index (PLI) can be calculated from the cumulative addition of all metals. PLIs are often well correlated with magnetic parameters. At this particular site, the main concern is acid rock drainage and the subsequent transport of heavy metals to streams nearby.

Approach
As this is the exploratory stage with respect to this data set, various methods were employed to categorize and organize the data. Much time was spent combining data from a number of sources including the susceptibility meter, pXRF and GPS used to store waypoints. Once in a usable format, waypoints and data were transferred into ArcGIS for hot spot analysis and general mapping needs.
Results
Hotspot Analysis
The hotspot analysis revealed that heavy metals are indeed found in greater concentrations at the Formosa Superfund Site. At this point, there is only one point that is not ‘hot’, and as such it represents a benchmark for comparison of other samples. Further sampling is required to properly delineate the zone of influence and to assess the degree and extent of contamination to nearby surface waters; Upper Middle Creek and South Fork Middle Creek.
Exploratory Regression
Once the magnetics data were properly normalized for mass and volume, they revealed some interesting correlations with heavy metals. In particular, Mn and V showed strong positive correlations with both low and high frequency susceptibility.
It is expected that additional data will smooth out some of the relationships. With so few points, it is difficult to assess variability and error in the data.
Pearson Correlation Coefficients
Correlation coefficients were calculated for all metals and magnetic susceptibility in both low and high field magnetization. The following pairs of variables were highly correlated (ie. had p-values less than 0.05): Ca and Zr, V and Ti, Fe and Cu, Fe and Zn, Fe and As, Cu and Zn, Cu and As, Zn and As and Zr and Ta. This suggests association of various metals with each other and in the case of Mn with high frequency susceptibility. Of particular interest are the metals correlated with Fe. Most magnetic minerals have Fe and hence these relationships are the most likely to be further elucidated by magnetic measurements. Additional data may yield stronger or weaker correlations between variables. This is yet to be determined.

Principal Component Analysis
PCA is meant to group components (factors) in a way that describes the maximum variance in a data set and hence each factor carries a weight associated with that variability. In this initial analysis, the majority of the variance is accounted for in the first 4 components (see Table 1). The weightings of factors for each component are listed in Table 2.
Table 1= Eigenvalues and percent variance covered by each principal component in analysis of sediments from Formosa Superfund Site

pca_components

The first component is difficult to interpret. The weightings are low and tend to the negative, showing inverse relationships explain much of the variability in the data. In the second component there is evidence for explanation of variance based on magnetic parameters. The third component shows even stronger evidence of variance explained by high field susceptibility. This has implications towards understanding the mineralogy that drives the magnetic signal. Further correlations with heavy metals signifies important relationships between magnetic minerals and metal contaminants.

Table 2- Weighting of factors for first 4 components of PCA of sediments from Formosa Mine Superfund site
pca_eigen
This third table associates the various components with their spatial location, identified by the labels in column 2. Further analysis would map these components to show the spatial aggregation of variables, giving further clues on environmental factors that drive them.
Table 3- Association of sample points with component drivers

pca_site

Significance and Further Direction
The significance of this research lies in its capability to quickly assess the degree and extent of anthropogenic pollution. To date, magnetic techniques have been employed over a diverse range of applications and landscapes. Further expansion of applications is desirable from many perspectives. Having quick, easy ways of determining hot spots of contamination focuses reconnaissance on areas that are most affected and/or vulnerable and better affords important resources to be allocated towards clean up and mitigation.
At this point, it is impossible to make definitive conclusions about this analysis. More data are needed and there are some considerations to be made with respect to the methodology in analyzing the samples to begin with. Aggregation of particles and association of magnetic materials with organic matter are just two considerations that need to be addressed before further processing and analyses are done. Cross-reference with EPA and BLM data would be a useful endeavor as well. There are many opportunities for expansion and collaboration on this project which should be pursued at this time.

Learning Outcomes
-ArcGIS likes easy to read files- be careful with file names, pathways and column headings
-I learned Statgraphics—It’s an easy tool to use with no language barriers. It’s like a stand-in for quick analysis
– I made zero progress on Python, Modelbuilder or R (which I haven’t used in so long that I feel I need to start from basics again)

Reference
Lu SG, Bai SQ. 2006. Study on the correlation of magnetic properties and heavy metals content in urban soils of Hangzhou City, China. Journal of Applied Geophysics 60 : 1–12. DOI: 10.1016/j.jappgeo.2005.11.002

For this project I was researching the relationship between ambient air temperature and MODIS land surface temperature measurements across Oregon. Public health research around heat stress and negative health outcomes currently uses point source measurements of weather to estimate exposures. The problem with this is that it is difficult to interpolate measurements between monitoring stations and these estimates are not accurate enough to realistically depend on. Using MODIS imagery to estimate exposure would allow researchers to look retrospectively and prospectively at estimates of exposure across nearly the entire planet over a very large temporal resolution.

For this project I used 1km resolution MODIS LST imagery for every day in 2014. I used only one swath of MODIS data, which covered approximately 85% of Oregon. This was compared with more accurate temperature measurements from NOAA monitoring stations. I used 46 stations across Oregon for the comparison.

To explore this relationship I used a Python script to extract the raster values across the entire time frame at the points of the 46 NOAA stations. I then used 2 R scripts to combine the data sets, change them into long format, and do the analysis. I used a linear mixed effects model to account for the random effect of location and also to introduce a temporal relationship to the data. Using the LME model I started with NLCD value, month of measurement, time of MODIS measurement, and elevation as covariates. I then removed covariates that were not significant to refine the model and minimize the standardized residuals.

NLCD and time of MODIS measurement were not found to be significant, and were therefore removed from the model. The results from the final model are shown below:

Parameter Estimate 95% CI P-value
(Intercept) 6.240 (3.007 – 9.472) 0.0002
MODIS Value 0.313 (0.302 – 0.324) 0.0000
Elevation -0.004 (-0.005 – -0.003) 0.0000
NLCD (evgrForst) 2.272 (-1.288 – 5.832) 0.2038
NLCD (lowIntDev) -2.446 (-7.133 – 2.240) 0.2968
NLCD (scrubBrsh) 0.961 (-2.661 – 4.583) 0.5939
Month (Feb) -1.341 (-1.944 – -0.738) 0.0000
Month (Mar) -0.631 (-1.207 – -0.053) 0.0322
Month (Apr) 0.301 (-0.255 – 0.858) 0.2889
Month (May) 2.354 (1.795 – 2.912) 0.0000
Month (Jun) 3.261 (2.680 – 3.840) 0.0000
Month (Jul) 7.894 (7.298 – 8.488) 0.0000
Month (Aug) 6.790 (6.202 – 7.378) 0.0000
Month (Sept) 6.077 (5.501 – 6.654) 0.0000
Month (Oct) 4.085 (3.523 – 4.645) 0.0000
Month (Nov) -2.161 (-2.724 – -1.597) 0.0000
Month (Dec) -1.970 (-2.554 – -1.385) 0.0000

 

Finally, to test how effective this model is I plotted the average residual value for each location on a map to attempt to identify any spatial pattern. There was none obvious; however there are still some missing factors in the model.

These results were interesting and I think that this type of model shows promise for public health research. With some refinement I believe that a good approximation could be made for ambient air temp using LST imagery.

This project was a great opportunity to combine the skills I have gained in R, Python, and using spatial data. Scripting the entire process made standardization and reproducibility easier, and will allow me to continue working on this model in the future (after a good break, that is).

Hello, Geo 584 readers!

In my previous blog post, I mentioned my main goal was to use statistical analysis including environmental variable mapping and spatial interpolation methods to quantify correlations between harbor porpoise movement patterns and distributions with biophysical variables.

However, to do all of this, I had to see exactly what kind of data I was working with. Over the previous two years I have collected spread sheet after spread sheet of survey effort, ship transect lines, marine mammal sighting data, oceanography data, acoustic data, and stranding data. This was my first chance to dive in and look at what all I was working with.

Step 1: Sort out all marine mammal sightings – this proved to be much more tedious than I thought J

Step 2: Plot all longitude and latitude gps points of sightings on a base map

Note: This is where the ship was when the marine mammal was spotted, not necessarily the exact location of the animal.

Step 3: Map a typical day of transect surveys from each of the three survey sites

Step 4: Add course bathymetry layer to the map – and add a 200m depth contour line

Note: Much as the name implies, Harbor Porpoise tend to be a near-shore species not typically inhabiting waters deeper than 200m.

Step 5: Distribution analysis using Arcmap tools – goal was to see if harbor porpoise had an overall different distribution vs all other marine mammals seen. My hypothesis was that harbor porpoise would have a more inshore distribution.

Results:

points

This was a map of my sightings based on the boat gps, again, this is not exactly where the animal was. This requires more triangulations and calculations which I will be taking a course on in August! So I wanted to save that analysis for then!

Picture3

This second image is a map of a typical day of transect surveys in each of the three sites. I decided to do this because if you look at the sightings along you tend to see a funky pattern, but this is mainly due to the layout of the transects.

hotspot

This third image has an added 200m depth contour line, again, this is because harbor porpoise ecology states that they tend to be a near shore species. The two ovals represented in this figure are the distributions of harbor porpoise in yellow vs all other species in green. The odd shape is due to the NH line of the surveys going about 15 more miles offshore than the other two. But it is easy to differentiate that harbor porpoise generally have a more inshore distribution as I predicted.

Walking through these steps was exciting for me this was my first chance to see visual representations of my data as well as learning GIS with using correct projections and distribution calculators.

What’s next – Plans for the next few weeks!

  1. Begin to focus only on harbor porpoises. I chose harbor porpoise for my indicator species for my thesis because they are abundant, sound sensitive, and most likely to overlap with marine renewable energy.
  2. Find fine scale layers: bathymetry, bottom type, etc. Using a basemap is great to look at the data visually, but it is hard to make any interpolations or statistical analysis without environmental covariates.
  3. Coordinating sightings vs effort: take into account unequal transects, length of transect line vs. odds of seeing porpoise
  4. Organize in-situ flow through oceanographic data collected concurrently with transect lines and then use spatial interpolation to create a fluid shape file of sea surface temperature, salinity, and chlorophyll a.
  5. Are environmental covariates determining distributions? SST, Distance to Shore, Depth, Season? What are driving these porpoise occurrences?

As you can see, I have plenty to work on! Thanks for reading!

Research Question

My research objective was to investigate the spatial scale and locations of surface feeding sites of humpback whales (Megaptera novaeangliae) in the Stellwagen Bank National Marine Sanctuary (SBNMS) in the southern Gulf of Maine, USA, and to investigate movement parameters of the whales.

 

Dataset

The existing data for this analysis stems from a long-term study investigating humpback whale behavior and ecology in the southern Gulf of Maine, USA, (for more details, see Friedlaender et al. 2009). Almost every summer since 2004, whales were equipped with non-invasive tags that recorded detailed information on the underwater movement of the whales or collected video-footage of the behavior of the tagged animal and associated whales. During daylight deployments that usually lasted for up to 8 hours, focal follows were conducted from a small boat following the tagged whale, during which detailed information on the behavior of the tagged whale at the water surface was collected. Because the tags did not contain a GPS, range and bearing information on the whale were also collected at least once when the whale was observed at the surface in between consecutive dives, usually resulting in the collection of one location point every 3-5 min (temporal extent: 117-260 min), with a spatial resolution of tens or hundreds of meters (less when the whale is foraging, more when it is traveling) (extent: 103-104 m). Continuous GPS locations of the boat were automatically collected. Based on the time stamps of the range/bearing data and the boat GPS data, the GPS locations collected at or close to the time the range and bearing data was collected were identified and together this data was used to calculate the locations of the whale.

 

Hypotheses

  • Surface feeding areas (FPT radius) are larger than mean length of prey schools (139 m, Hazen et al. 2009)
  • Size of surface feeding areas (FPT radius) positively correlated with group size (larger groups search larger area)
  • Positive correlation between FPT (at FPT radius) and depth and slope (prey habitat preference)
  • No autocorrelation in FPT
  • No autocorrelation of absolute turning angles on spatial scale of surface feeding sites (unpredictable foraging movement)

 

Approaches

  • Scan focal follow data for deployments during which surface feeding was major behavior
  • For these deployments, use R package adehabitatLT (Calegne 2011) to implement first-passage time (FPT) metric
    • Based on the location points of an animal and the time spend at/between these points, FPT radius identifies the spatial scale on which foraging effort along an animal’s path is concentrated (Fauchald & Tveraa 2003)
    • The time spent within this radius of each location point is called FPT. Location points with high FPT identify important foraging areas along the animal’s path (Bailey & Thompson 2006)
  • Compare FPT radius for each deployment to the group size during the deployment
  • Map color-coded FPT (at FPT radius) for each deployment onto bathymetry and slope raster layer in ArcMap
  • Extract depth and slope values for each location point
  • Connect the location points for each deployment with lines to show path of the animal
  • Buffer each location point by the value of the FPT radius for the respective deployment
  • Dissolve buffer for better visibility of the range of foraging effort around the location points and relative to underlying depth and slope rasters
  • Perform linear regression of FPT against depth/slope for each deployment and calculate r2 value
  • Calculate autocorrelation in turning angles and first-passage time using Moran’s I coefficient of autocorrelation in the pgirmess R-package

 

Results

Three deployments (188a, 188b_f, 195b) were identified for which suitable location data existed and that contained a large amount of surface feeding. Because deployment 188b_f originally also contained a large amount of traveling, here, only the part of the path with surface feeding activity was included in the analysis. Table 1 summarizes the results of the analyses.

 

Table 1: FPT radius, group size and correlation coefficient r2 for the regression of FPT against depth and slope are shown for each of the three deployments.

final_results

  • There does not seem to be a fixed spatial scale of surface feeding as the values of FPT radius range between 90 and 347 m.
  • It is difficult to investigate a potential relationship between FPT radius and group size as the foraging groups were very dynamic with changes in group size throughout the deployments. However, the smallest FPT radius was calculated for the only group that consisted of group size 1 for some part of the deployment, and the largest FPT radius was observed for the group that had the largest group size during some part of the deployment.
  • Significant relationships between FPT and depth or slope were found for two of the three deployments. For deployment 188a, depth explained 17.2 % of the variability in FPT (p=0.001). For deployment 188b_f, depth explained 14.2 % (p<0.001) and slope 29.2 % (p<0.001) of the variability in FPT (Figures 1-3)
  • Autocorrelation in FPT was found for all three deployments at scales greater than the FPT radius for each deployment (Figure 4)
  • Autocorrelation in absolute turning angles was found for two of three deployments at spatial scales much greater than FPT radius.

 

Figures 1-3: Whale paths, FPT and FPT radius mapped on slope chart of SBNMS (USGS/SBNMS). Whale locations for each deployment (circles) are color-coded to represent high/low FPT (red/green) at FPT radius (purple buffer) and connected with lines to show temporal sequence of locations. From top to bottom: 188a, 188b_f, 195b.

final_188a_track

final_188b_track

final_195b_track

 

Figure 4: Moran’s I autocorrelation coefficient calculating FPT autocorrelation plotted against lag distance for all deployments. Vertical lines represent FPT radius for each deployment. Red circles indicate significant autocorrelation in FPT.

final_AC fpt

 

 

Discussion and Significance

The three deployments investigated here showed a wide range of spatial scales of surface feeding. However, there is some indication that this variability could at least in part be explained by the differences in group size between the three groups, with larger groups having a greater spatial scale of surface feeding. The spatial scale of surface feeding is somewhat consistent with the mean size of prey schools. Bathymetry and bathymetric relief seem to have some influence on the locations of concentrated search effort (FPT). This is obvious from both the correlation coefficients of FPT and depth/slope as well as from the map. However, the results of this analysis are obscured by the autocorrelation in FPT. Since the deployments are opportunistic observations, no absence data was analyzed here and the sample size is small, it is unclear whether non-feeding whales also concentrate their activity on similar locations within SBNMS and whether surface feeding also occurs over other parts of SBNMS with different ranges of depth/slope. The spatial scale of autocorrelation in FPT (451-596 m) is more similar between the three whales than the spatial scale of foraging effort. This could suggest the existence of a prey searching mechanism that is similar for all three whales and works on a scale larger than the foraging mechanism captured by the FPT analysis.

A better understanding of the foraging mechanisms of the whales can help to improve management decisions aimed at mitigating ship strike and entanglement risks in the SBNMS.

 

Learning outcomes

  • Working with spatial data in R
  • Implementing FPT metric
  • Calculating spatial autocorrelation
  • Extracting raster values in ArcMap

 

 References

Bailey, H. & Thompson P. (2006). “Quantitative Analysis of Bottlenose Dolphin Movement Patterns and Their Relationship with Foraging: Movement Patterns and Foraging.” Journal of Animal Ecology 75 (2): 456–65.

Calenge, C. (2011). “Analysis of Animal Movements in R: The adehabitatLT Package.” Saint Benoist, Auffargis, France: Office Nationale de La Chasse. http://cran.gis-lab.info/web/packages/adehabitatLT/vignettes/adehabitatLT.pdf.

Fauchald, P. & Tveraa T. (2003). “Using First-Passage Time in the Analysis of Area-Restricted Search and Habitat Selection.” Ecology 84 (2): 282–88.

Friedlaender, A. S., Hazen, E. L., Nowacek, D. P., Halpin, P. N., Ware, C., Weinrich, M. T., Hurst, T., Wiley, D. (2009). Diel changes in humpback whale Megaptera novaeangliae feeding behavior in response to sand lance Ammodytes spp. behavior and distribution. Marine Ecology Progress Series 395: 91–100.

Hazen E.L., Friedlaender A.S., Thompson M.A., Ware C.R., Weinrich M.T., Halpin P.N., Wiley D.N. (2009). “Fine-Scale prey aggregations and foraging ecology of humpback whales Megaptera Novaeangliae.” Marine Ecology Progress Series 395: 75–89.

 

Jennifer Allen and Michael Thompson calculated and provided the whale locations based on boat GPS and range and bearing data. Thank you!

The spatial problem I explored this quarter was about quantifying the extent of the foraging ranges of Adelie and Gentoo penguins breeding at Palmer Station, Antarctica. My original research question was whether interspecific competition could be a possible mechanism driving penguin population trends at Palmer Station. In retrospect, this question was a bit beyond the scope of the spatial analysis I proposed to conduct. However, the approach I used to test my hypothesis (that the foraging ranges of these two species would overlap) is an important first step in starting to answer this question.

The dataset I used to conduct this analysis consisted of location data (Lat/Long coordinates) obtained from platform terminal transmitters (PTTs). Over the course of the 2015 breeding season (5 January-2 February), 20 penguins (n=5 Adelie, n=15 Gentoo) were outfitted with PTT tags for roughly 3 days each. Over these three days, tags transmitted location data to ARGOS satellite system. With the specific purpose of learning spatial analysis techniques in mind, all datapoints were treated as foraging locations. Further analysis of PTT data combined with TDR (time depth recorder) data would need to be conducted in order to separate foraging locations from travelling locations. Location data from individual birds were grouped together by species (n=522 Adelie, n=147 Gentoo). The purpose of this was to analyze each species foraging distribution as a whole rather than look at individual tracks.

I used a kernel density (KD) approach to answer my question of interest. I chose this approach because it is one of the most widely used techniques to apply to tracking data for hot spot analysis, and because it appeared to be relatively easy and quick to learn. My goal was to create isopleths of utilization in order to identify areas used for foraging (95% KDE) and core use areas (50% KDE). The general idea being that the area contained within the 50% contour line would be the smallest area encompassing 50% of the datapoints used to create the entire KDE. I also sought to determine the area (km²) within of each of these contour lines and calculate the proportion of overlap between the two species ranges.

My results are summarized in table 1 (below). Gentoo penguins have a larger foraging range (core use and overall) concentrated around the colony where they were tagged, as well as near the head of Palmer deep canyon (figure 1). Adelie penguins have a more densely concentrated (near shore) range centered around the colonies where they were tagged. These results provide evidence to support my hypothesis that the ranges of the two species overlap. A greater percentage of Adelie foraging area overlaps with Gentoo area, due to the fact that their range is smaller.

1

Table 1. Estimates of core use (50% KDE) and total (95% KDE) foraging areas used by Adelie and Gentoo penguins with associated overlap between species.

 

 

Picture1

 

Figure 1. Map depicting kernel density contours for 95% and 50% KDEs. Dark blue and red symbolize overall and core use areas of Adelies and lighter blue and red represent Gentoo ranges.

The significance of these results is questionable due to issues I was unable to address by the end of the quarter. Kernel density estimates are influenced significantly by the smoothing factor (search radius) used, which in turn is influenced by the density of datapoints considered. Therefore, sampling size, or the number of datapoints used in each kernel density estimate, has a big effect on the final KDEs. In this analysis, I used a much larger sample of Adelie locations then I did Gentoo locations. I have begun testing the effect of sample size on these KDEs, but have yet to come to any conclusions about the appropriate number of datapoints to use in order to gain an accurate estimation of foraging range.

Once I’ve addressed this issue of unbalanced sampling, I will be more confident in drawing conclusions about the foraging ranges of these two species. In the future I intend to use this information to make comparisons of these ranges between species and across years of variable prey. Ultimately, this knowledge will inform larger questions of the Palmer Long Term Ecological Research (LTER) project (e.g. how do changes in the marine environment affect the behavior and distribution of penguins? Are penguins competing with each other and/or other krill predators (e.g. whales) in the Palmer area? How does prey variability affect these relationships?).

Over the quarter I’ve gained more knowledge of ArcMap, specifically the spatial analyst toolbox and the kernel density tool. I’ve also begun to learn these same techniques in R and I hope to continue to expand on that in the future.  Thanks Julia and Mark!

The question I examined for this project is how does the frequency of reported red-tailed hawk observations from the eBird database change between 2000 and 2014 in NW Oregon? For most of this term, however, I was trying to determine patterns in red-tailed hawk residency (as defined by the ratio of days in a calendar year a red-tailed hawk was observed at a given location to the number of days any other species was observed). I attempted to do so using a multiple regression model. After multiple failed attempts to construct an accurate model, I determined that the eBird data I was using were not suited for such analyses or at least not suited to an approach reasonable for the scope of this course. By first investigating the patterns in red-tailed hawk observation frequency, I thought I might be able to figure out why the data were producing poor results and ways to mitigate these weaknesses.

 

The bird observations I used to examine this question are from the eBird database, an online citizen science monitoring and reporting system. For each eBird observation, an observer submits the species, the geographic location, and the date/time. These data exist at a range of spatial resolutions from several meters to several kilometers. Because I will analyze habitat colonization at the patch scale, I wrote a Python script to select observations where lat/long is reported at or above a specified precision. The extent of these data stretches beyond the continental US. Temporally, eBird data are reported with a timestamp specified to the minute, however, I analyzed observations by year. I used data from 2000 to 2014. Data exist for observations before 2000, but they are much more sparse.

 

For environmental covariate data, I used various land cover parameters, population data, and climate data. The land cover parameters were derived from the National Land Cover Database (NLCD) data from 2011, which exist at a 30m resolution. I also used US Census tract data from 2010 to estimate population density. Lastly I used PRISM climate data, including average minimum temperature and average precipitation. The PRISM data used in this project were produced at an 800m resolution and represent averages from 1981-2014.

 

I expected to find that yearly variation in red-tailed hawk frequency would be minimal. I anticipated that frequency would be highest on the edges of urban areas near large patches of open space such as agricultural land, sagebrush, urban parks, etc. However, I also expected that these effects would be tempered by spatial bias in the eBird data (i.e. high densities of observations in and near urban areas and relatively sparse observations in rural areas). This would be represented, in part, by negative relationship between canopy density to frequency and impervious surface coverage to frequency.

 

My analysis process to address this problem was as follows:

  • Create a grid of points at 500m intervals
  • Snap red-tailed hawk observations to the closest grid point
  • Count points within every cell of a 500m raster
  • Extract the value of the count raster and append it to the snapped observation points
  • Divide the count for each record by the maximum count for that year to get a relative value of frequency (values range between 0 and 1)
  • Extract the value of each independent variable and append it to the snapped observation points.
  • Select 1,000 stratified random samples from snapped observation points with a Python script
  • Run Ordinary Least Squares on random samples
  • Use coefficients from the regression equation to weight environmental variables in a Raster Calculator expression to produce a prediction of frequency for the entire study area.

 

The result from this process is a raster of predicted frequency values for every cell in the study area. I repeated this process for years 2000, 2005, and 2010–2014. The total number of observations for year 2000 was 172 and the total for year 2005 was 627 so all observation records were used for each. The OLS tool failed to run for year 2000 data due to multicolinearity (probably a result of the small sample size mostly from highly populated areas). For year 2005, the model performed poorly and the prediction raster was almost completely opposite of all other predictions.

 

For all other years, overall model performance was good with an R-squared value for years 2010-2013 >.9 and an R-squared valued of .38 for 2014. Akaike’s Information Criterion was < 500 for 2010-2014. However, the residuals for all years were not normally distributed for any of the models, so they are not necessarily reliable. Because the Raster Calculator expressions used to produce the prediction rasters are essentially the regression model equations, the following expressions indicate the strength and the nature of the relationship with each variable for each year. They also show which variables were significant for each year, as any variables that were not significant in a model were omitted from the corresponding Raster Calculator expression. (NOTE: dependent variable values ranged from 0-1 so coefficient values are proportionately small):

2010:     -0.113637 + (0.109701 * “lowDev500m”) +  (-0.058497* “shrub500m”) + (0.498112 * “water500m”) + (-0.00052 * “precip”) + (0.014086 * “minTemp”) + (0.095448 * “lcdivrsty_0_1”) + (0.001215 * “imperv500m”)

2011:   -0.158069 + (0.169261 * “lowDev500m”) + (-0.039183 * “ag500m”) + (0.000008 * “pop”) + (-0.094665 * “highdev500m”) + (-0.035761* “shrub500m”) + (0.403930 * “water500m”) + (-0.00033 * “precip”) + (0.018631 * “minTemp”) + (0.03855 * “lcdivrsty_0_1”) + (0.002261 * “imperv500m”)

2012:  -.212994 + (0.224267 * “lowDev500m”) + (-0.065403 * “ag500m”) + (0.000008 * “pop”) + (-0.172365 * “highdev500m”) + (-0.111544 * “shrub500m”) + (0.721691 * “water500m”) + (-0.00008 * “precip”) + (0.031621 * “minTemp”) + (0.117999 * “lcdivrsty_0_1”) + (0.002558 * “imperv500m”)

2013:  -0.113637 + (0.109701 * “lowDev500m”) + (-0.009148 * “ag500m”) + (-0.058497 * “shrub500m”) + (0.498117 * “water500m”) + (-0.00052 * “precip”) + (0.01406 * “minTemp”) + (0.095448 * “lcdivrsty_0_1”) + (0.001768 * “can500m”) + (0.001215 * “imperv500m”)

2014:  -0.116055 +  (0.069739 * “lowDev500m”) + (0.118068 * “ag500m”) + (0.000056 * “pop”) + (0.092787 * “shrub500m”) + (0.020673 * “water500m”) + (-0.00028 * “precip”) + (0.089624 * “minTemp”) + (0.277722 * “lcdivrsty_0_1”) + (-0.003689 * “can500m”) + (-0.007453 * “imperv500m”)

 

Variable definitions:

lowDev500m – value of 1 if low intensity urban development is the dominant land cover type within 500m, 0 if not

ag500m – value of 1 if pasture or cultivated crops are the dominant land cover type within 500m, 0 if not

pop – population density from 2010 US Census tract data (calculated by total population/area)

highdev500m – value of 1 if high intensity urban development is the dominant land cover type within 500m, 0 if not

shrub500m – value of 1 if shrub is the dominant land cover type within 500m, 0 if not

water500m – value of 1 if water is the dominant land cover type within 500m, 0 if not

precip – average annual precipitation from 1981-2014

minTemp – average daily minimum temperature from 1981-2014

lcdivrsty_0_1 – value of 1 if 3 < diversity < 9, 0 otherwise

can500m – average canopy density within 500m

imperv500m – average impervious surface cover within 500m

 

Some expected patterns were present in all other prediction rasters. For example, predicted frequency was lowest in forested areas and high elevation areas (with low average minimum temperatures), and it was highest in low-lying open space and in areas of low urban development. At finer scales, however, the differences between years were more significant. For instance, local differences in urban development had a noticeably stronger influence for 2012 and 2014. The low values in forested areas were also less uniform for these years. Below are the prediction rasters for 2010-2014.

2014
2013

 

 

 

 

 

 

 

 

 

2012
2011

 

 

 

 

 

 

 

 

 

2010

 

If the regression models can be trusted, these results suggest that any assessment of some other metric based on frequency values (e.g. species residency) is also likely to be highly variable from year to year. Additionally, the parameters used in my regression models are still not sufficient to properly explain the patterns in frequency of red-tailed hawks. A more complex model is likely necessary to account for complex spatiotemporal variation and eBird data biases.

 

Throughout this exploration of eBird data, I have learned a lot about the statistical tools available in ArcGIS and the limitations of some of them. I was also challenged to write a couple of Python scripts that forced me to expand my knowledge of the Pandas package for handling large data tables. While I used ModelBuilder to automate some ArcGIS processes, I don’t think I learned anything new. I did not use R at all for this project.

In my last blog post, I analyzed spatial autocorrelation in the whale movement parameters swimming speed and turning angles between consecutive segments of the whale’s trajectory for a single whale. In this update, I am expanding on this analysis by analyzing over a range of distances the spatial autocorrelation in swimming speed and turning angles in the trajectories of three foraging whales in the Stellwagen Bank National Marine Sanctuary. Positive autocorrelation in either parameter would mean that, when comparing two trajectory segments, the values for this parameter are similar between the two segments, and negative autocorrelation would mean that they are not similar. A correlogram shows the values of the autocorrelation coefficient for a range of distances between the trajectory segments. Here, I am presenting results from the analysis of the whale trajectories using the R CRAN adehabitatLT package (Calenge 2011).

The correlogram below shows the Moran’s I autocorrelation coefficient for the swimming speeds of three whales. Two whales show significant autocorrelation in swimming speed over short distances (<1000 m) (p<0.05, indicated by red circles). This means that during segments of the whale’s trajectories that are within 1000 m of each other, the whales maintained similar speeds. This is not surprising because generally it does not seem likely that the whales would abruptly change their swimming speed over such short distances.

AC speed

After converting the turning angles to radians, the analysis of autocorrelation in turning angles (Moran’s I) revealed that the turning angles of only one trajectory were significantly positively autocorrelated at distances of 1000 and 2000 m (p<0.05, indicated by red circles).

AC angles

 

 

Next, I used the R CRAN package adehabitatLT (Calenge 2011) to calculate first-passage time (Fauchald & Tveraa 2003) as metric for the search effort along each whale’s trajectory, and used linear regression to relate first-passage time to the environmental variables water depth and seafloor slope. The image below shows the three trajectories (in turquoise: 195b, purple: 188b_f, red: 188a) on a slope chart of the Stellwagen Bank National Marine Sanctuary area (USGS/NOAA).

slope_traj

 

 

Basend on the description by Fauchald & Tveraa (2003), for each trajectory, first-passage time quantifys the spatial scale of the animal’s foraging effort. The values of first-passage time at this spatial scale distinguish areas with high foraging effort (long first-passage time) from areas with low foraging effort (short first-passage time). The color-coded figure below shows first-passage time for whale 188b_f relative to seafloor slope (red: long first-passage time, green: short first-passage time).

slope_fpt

Simple linear regression revealed that depth explained 17.2% of the variance in first-passage time for the trajectory of whale 188a (p=0.001). Separately, depth and slope explained 14.2 and 29.2 %, respectively, of the variance in first-passage time for the trajectory of whale 188b_f (each p<0.0005) (see figures below). For the trajectory of whale 195b, neither depth nor slope were significant predictors of first-passage time (each p>0.2).

regressionSome authors (see Calenge 2011 for details) have suggested the analysis of autocorrelation of movement parameters of an animal’s trajectory following the standardization of the segment lengths. I will investigate this method in a follow-up analysis. Furthermore, I will re-analyze autocorrelation in turning angles using the absolute values of the turning angles instead of radians to facilitate the interpretation of the results.

 

Literature cited:

Calenge, C. 2011. “Analysis of Animal Movements in R: The adehabitatLT Package.” Saint Benoist, Auffargis, France: Office Nationale de La Chasse. http://cran.gis-lab.info/web/packages/adehabitatLT/vignettes/adehabitatLT.pdf.

Fauchald, P. & T. Tveraa. 2003. “Using First-Passage Time in the Analysis of Area-Restricted Search and Habitat Selection.” Ecology 84 (2): 282–88.

My goal over the last few weeks has been to determine the relationship between red-tailed hawk residency and environmental variables. I realized though that before I could do so, there were some data quality issues that I needed to address. Specifically, I realized that the 0 values in my residency raster (a ratio of the number of days red-tailed hawks were observed to the number of days any other species were observed) represented locations where there weren’t any hawks. Since only the locations where hawks were observed are relevant to my analyses, I removed records with a ratio of 0.

 

In removing these 0 values from my data (which cut the data roughly in half), I realized that they probably had a significant influence on the hotspot analyses I ran in the first few weeks of the class. I re-ran the hotspot analysis, and as expected, the hotspots were much more finely articulated. I decided to see what influence other portions of the data might have on the hotspot analysis so I tried iterations with <100%(meaning a hawk was seen on every day that any bird was seen), 0< and <100%, and 0< and <50%. The latter two produce nearly identical hotspot maps so it seemed appropriate to use all data 0< and <100%.

Screen Shot 2015-05-10 at 4.48.20 PM  Screen Shot 2015-05-10 at 4.50.04 PMScreen Shot 2015-05-10 at 4.49.25 PM Screen Shot 2015-05-10 at 4.52.24 PM

Hotspot analyses from upper right, clockwise: all data, <100%, 0< and <100%, and 0%<.

 

Next I prepared my environmental variable data. For my regression model, I used 8 variables:

  • Population – value of containing census tract from US Census data
  • Average precipitation – value of cell from 2014 PRISM data
  • Minimum Temperature – ibid
  • Percent open space with 1km radius – reclassified NLCD data (Herbaceous Upland, Grasslands/Herbaceous, Planted/Cultivated, Pasture/Hay, Row Crops. Small Grains, Fallow, Urban/Recreational Grasses = 1, everything else 0) > Focal statistics mean
  • Dominant land cover in 500m radius– focal statistics majority on NLCD
  • Avg percent canopy cover in 500m radius – focal statistics mean
  • Avg percent impervious surface in 500m radius – ibid
  • Land cover diversity in 500m radius – focal statistics variety

 

I then ran the Ordinary Least Squares regression tool. My R-squared value was .214. From the report the tool produced, I concluded that the residuals were not randomly distributed.

Screen Shot 2015-05-12 at 9.40.37 PM

 

To be sure, I also ran the Spatial Autocorrelation tool on the residuals, and there is a 1% chance that the distribution could be random. I also ran a hotspot analysis on the residuals at two scales, 1,000 ft (the minimum distance band that wouldn’t produce an error) and 47,891 ft (the calculated distance band from my previous analyses). While the 1,000 ft distance band did not produce anything interpretable, the 47,891 ft distance band may point to flaws in model design. That is, the distinct locations where the model over-predicted and under-predicted may suggest other environmental variables I should include or ones I should modify/drop from my model. I haven’t figured out what these are yet though.

Screen Shot 2015-05-10 at 6.15.52 PM Screen Shot 2015-05-10 at 6.16.14 PM

Hotspot analyses on residuals at 1,000 ft (left) and 47,891 ft (right).

 

 

The Koenker (BP) Statistic indicated that my model is heterscedastic (i.e. the model is not evenly fit for high and low dependent variable values). To try to understand why, I re-ran the OLS tool on a subset of my data where the residency ratio < 5% and another subset where residency > %50. Both of these subsets totaled about 3500 records each. The R-square value for > 50% was .29 and .14 for < 5%. The differences in the histograms are also telling.

 

Screen Shot 2015-05-12 at 9.15.20 PM
< 5%, population, avg precip, min temperature, percent open space/km, dominant land cover
Screen Shot 2015-05-12 at 9.49.40 PM
> 50%, population, avg precip, min temperature, percent open space/km, dominant land cover
Screen Shot 2015-05-12 at 9.16.10 PM
< 5%, avg percent canopy per 500m, avg percent impervious surface per 500m, land cover diversity

 

 

 

 

Screen Shot 2015-05-12 at 9.48.43 PM
> 50%, avg percent canopy per 500m, avg percent impervious surface per 500m, land cover diversity

 

From these plots, I will try to develop other environmental variables that may be better predictors of residency.

 

Since my last update I’ve made significant progress in estimating the foraging ranges and overlap between Adelie and Gentoo penguins at Palmer Station over the 2014/15 breeding season.

With the help of a classmate (thanks Steven!) and a few online forums (GIS in Ecology & GIS 4 Geomorphology), I was able to figure out how to calculate kernel density estimates (KDE) without Arc’s outdated Animal Movement Extension or Hawth’s Analysis Tools.

Objective: Quantify the geographical extent of the distribution of Adelie and Gentoo penguins foraging around Palmer Station

  • Create kernel density estimates to identify areas used for foraging (95% KDE) and core use areas (50% KDE)
  • Calculate the area (km²) within 95% and 50% kernel density contours
  • Calculate the % overlap between the ranges of Adelie and Gentoo penguins

Methods:

  1. Filter data points whose estimated error is >1500m
  2. Combine location data points for all Adelie (ADPE) individuals n=15 (522 data points) and all Gentoo individuals (GEPE) n=5 (147 data points)
  3. Create kernel density estimates using the kernel density tool and default parameters
  4. Extract values by points from the output obtained above, determine 50% and 95% of observations using values of extracted points from attribute tables
  5. Reclassify kernel density raster so values >50th percentile have a new value of 50 and all others have a new value of NoData, use the same steps to create additional rasters representing 95% of points
  6. Convert rasters to polygons, calculate area of each polygon using calculate geometry tool
  7. Use union function to determine area of overlap between polygons

Results:

1

 

 

 

 

 

Table 1. Estimates of core use (50% KDE) and total (95% KDE) foraging areas used by Adelie and Gentoo penguins with associated overlap between species.

2Figure 1. Visual representation of Adelie core use (red) and total foraging area (pink) and Gentoo core use (dark blue) and total (light blue) foraging areas. Despite poor image quality it is obvious that these ranges are closely associated with the colonies that the respective species are from, and there appears to be some association with bathymetry as the range of Gentoo’s is dense at the head of Palmer deep canyon.

3

Figure 2. Close up visual representation of Adelie core use (red) and total foraging area (pink) and Gentoo core use (dark blue) and total (light blue) foraging areas. Despite poor image quality it is obvious that these ranges are closely associated with the colonies that the respective species are from, and there appears to be some association with bathymetry as the range of Gentoo’s is clustered at the head of Palmer deep canyon. Note overlap between species.

Discussion:

The results of this analysis indicate that Gentoo penguins occupy a larger foraging range (core use and overall) and because of this, the portion of their range that overlaps with that of the Adelie penguins is minimal to moderate. The opposite is seen in Adelie penguins, who appear to have a smaller foraging range and thus a higher proportion of it overlaps with Gentoo penguins. Also notable is the fact that Gentoo penguins appear to be foraging farther away from their colony than Adelie penguins, which is surprising as the opposite is usually true. The main caveat of these results is the difference in sample size between data points of Gentoo (n=147) and Adelie (n=522) penguins. This was not accounted for in this analysis and is likely skewing these results. The fact that Gentoo’s have a larger range could be because there were fewer data points used in the creation of the KDEs.

The next step in this process will be to research methods that take sample size into account. One possibility is taking a random sample of Adelie location points from the total sample so that Adelie’s are represented equally to Gentoo penguins.

I will also be experimenting with KDE in R. This will allow me to compare results between the two methods (and R should speed this process up down the road)!

I am also in the process of determining whether a bathymetric layer and/or accurate basemap exists for this region. So far I’ve had difficulty finding these things but they would be very useful to compare these results to co-variates such as bathymetry and distance to shore.