Humpback whales feed in the temperate high latitudes along the North Pacific Rim from California to Japan during the spring, summer, and fall and migrate in the winter to the near-tropical waters of Mexico, Hawaii, Japan, and the Philippines to give birth and mate (Calambokidis et al. 2001; Calambokidis et al. 2008).  Although whales show strong site fidelity to feeding and breeding grounds, genetic analysis of maternally inherited DNA (mitochondrial DNA or mtDNA) reveals greater mixing of individuals on the feeding grounds (Baker et al. 2008).  This mixing makes it difficult to determine regional population structure and may complicate management decisions.  For example, should the feeding grounds be managed as one population unit or is there evidence to suggest that more than one management unit is present?  If more than one, are they affected differently by coastal anthropogenic activities, and therefore, require population specific management strategies?

With this in mind, I decided to explore the spatial pattern of humpback whales from the Western and Northern Gulf of Alaska, a subset of data collected during the SPLASH Project (Structure of Populations, Levels of Abundance, and Status of Humpbacks; http://www.cascadiaresearch.org/splash/splash.htm).  Specifically, I am interested in the following questions:

  1. Do whales form clusters? Do whales that are more closely related (have the same mtDNA haplotype) cluster together?
  2. Are there spatial patterns in whale distribution based on depth?  Do more closely related whales cluster together based on depth?
  3. Are there spatial patterns in whale distribution based on slope?  Do more closely related whales cluster together based on slope?

The bathymetry layer, GEBCO_08 Grid, version 20091120 ( http://www.gebco.net) was used for depth and slope analyses in questions 2 and 3.  Depth data were extracted using ArcGIS 10.1 Extract Values to Points tool within the Spatial Analyst Toolbox.  Slope values were derived  from the bathymetry data using ArcGIS 10.1 and the Slope tool; slope values were then extracted using the Extract Values to Points tool within the Spatial Analyst Toolbox.

**Results presented here are strictly for the purposes of exploring the functionality of the ArcGIS tools found in the Spatial Statistics Toolbox.  They should be considered preliminary and should not be reproduced elsewhere.**

 

Part 1: Average Nearest-Neighbor Analysis

This tool is based on the null hypothesis of complete spatial randomness and calculates a nearest neighbor index based on the average distance from each feature to its nearest neighboring feature.  The nearest-neighbor ratio is calculated as the Observed Mean Distance divided by the Expected Mean Distance and has a value of 1 under complete spatial randomness.  Values greater than 1 indicate a dispersed pattern, while values less than 1 indicate a clustered pattern.

 

Clustering in whales

 

Haplotype

n

Obs Mean Dist (m)

Exp Mean Dist (m)

N-N Ratio

Z-score

p-value

Pattern

All

788

1913.54

19537.25

0.0979

-48.443

0.00000

Clustered

A-

202

5962.24

33738.56

0.1767

-22.385

0.00000

Clustered

A+

220

8753.78

34046.85

0.2571

-21.080

0.00000

Clustered

A3

91

7404.13

34929.24

0.2120

-14.381

0.00000

Clustered

E1

73

16789.95

50701.04

0.3312

-10.932

0.00000

Clustered

E3

46

12843.30

34918.80

0.3679

-8.203

0.00000

Clustered

F2

83

14402.10

51259.82

0.2810

-12.532

0.00000

Clustered

The output of this tool indicates that whales, regardless of mtDNA haplotype, are significantly clustered in the Western and Northern Gulf of Alaska.  This result is not entirely surprising, given that humpback whales tend to form small groups on the feeding grounds.  However, the results of this tool are very sensitive to changes in the study area, and therefore it is best to use this tool with a fixed study area.  This approach was not done for the current analysis.  Instead, the area of the minimum enclosing rectangle around the input features was used and this area varied for each haplotype variable.

Based on the results it seems the average nearest neighbor tool may not be the most appropriate tool for discovering spatial patterns in humpback whales.  However, it would be worth running the tool again using a fixed study area before discarding its utility for this data set completely.

Alternatively, it would be worth conducting a refined nearest-neighbor analysis in which the variable of interest (mtDNA) is the complete distribution function of all observed nearest neighbor distances (not just the mean nearest-neighbor distance) and use a specified distance with which to test for complete spatial randomness.  This method is not currently available within the ArcGIS Spatial Statistic Toolbox and would need be conducted in another software package such as R.

 

Part 2: Hot Spot Analysis

This tool uses the Getis-Ord Gi* statistic to identify statistically significant hot spots (clusters of high values) and cold spots (clusters of low values) given a set of weighted features.   For each feature in the data set, a Gi* statistic is returned as a z-score.  The larger the positive z-score, the more intense the clustering of high values (hot spot).  The smaller the negative z-score, the more intense the clustering of the low values (cold spot).

HotspotScale

Figure 1. The output scale for the hot spot analysis tool.  When interpreting the results, it is useful to remember that a feature mapped as bright red may not be because its value is particularly large but because it is part of a spatial cluster of high values.  Conversely, a feature mapped as bright blue may not be because its value is particularly small but because it is part of a spatial cluster of low values.  Thus, the more positive a z-score is, the hotter the hot spot (darker red), while the more negative a z-score is, the colder a cold spot (darker blue).

 

Spatial patterns in whale distribution based on depth

 

Hotspot_Depth_HapsOnly

Figure 2. Results of hot spot analysis for all whales (n=799) based on depth (m), no mtDNA considered.

 

The output from this tool shows the presence of several hot and cold spots regardless of mtDNA haplotype (Figure 2). The hot spots (red) indicate that whales in these areas occur at shallower depths and the results are statistically significant.   There are also several statistically significant cold spots (blue) where whales are found at deeper depths, often beyond the continental shelf.

 

Hotspot_Depth_ByHap

Figure 3. Results of hot spot analysis by haplotype based on depth (m).

 

The output by haplotype also shows the presence of several hot and cold spots, although the location of each varies by haplotype (Figure 3).  The A+ and A- haplotypes show statistically significant hot spots in the Northern Gulf of Alaska while the E1 and F2 haplotypes show a less intense cluster of in the same region, although still significant.  The E1 haplotype also shows a significant hot spot in the Western Gulf of Alaska.  These hot spots reflect whales clustering by haplotype at shallower depths.  The A3 and E3 haplotypes have relatively little clustering – no hot spots and a very small cold spot in the western region.  In general, for all haplotypes, cold spots are located in the western region or beyond the continental shelf where whales cluster at deeper depths.

 

Spatial patterns in whale distribution based on slope

 

Hotspot_Slope_HapsOnly

Figure 4. Results of hot spot analysis for all whales (n=799) based on slope (degrees), no mtDNA considered.

 

The output from this tool shows the presence of several hot and cold spots regardless of mtDNA haplotype (Figure 4). The hot spots (red) indicate that whales in these areas occur at steeper slopes and the result is statistically significant.   There are also several statistically significant cold spots (blue) where whales are found at flatter slopes.

 

Hotspot_Haplotype_Slope

Figure 5. Results of hot spot analysis by haplotype based on slope (degrees).

 

The output by haplotype also shows the presence of several hot and cold spots, although the location of each varies by haplotype (Figure 5).  The A+, A-, A3 and F2 haplotypes show statistically significant hot spots in the Northern Gulf of Alaska while the A3, F2, and E3 (to a lesser extent) haplotypes also show hot spots in the western region.  These hot spots reflect whales clustering by haplotype at steeper slopes.  The A+, A-, A3, and F2 haplotypes have statistically significant cold spots in the northern region, while a cold spot for the E1 haplotype occurs in the western region.   These cold spots reflect whales clustering by haplotype at flatter slopes.

Reflecting on my results, I initially thought perhaps the hot/cold spot patterns found might be influenced by the uneven sampling effort and differences in sample size.  However, on 23 May 2013 Lauren Scott from Esri commented on this very subject in response to a posting by Jen Bauer (http://blogs.oregonstate.edu/geo599spatialstatistics/2013/04/24/discerning-variables-spatial-patterns-within-a-clustered-dataset/#comment-1393).  Lauren stated that even if sampling is uneven (e.g., many samples are taken from some areas, while fewer samples are taken at others), the impact to the results of a hot spot analysis will be minimal.  She provided the following for further clarification.   In areas with many samples, the tool will have more information to compute its result.  The tool will “compare the local mean based on lots of samples to the global mean based on ALL samples for the entire study area and decide if the difference is statistically significant or not”.  In areas with fewer samples, “the local mean will be computed from only a few observations/samples… the tool will compare the local mean (based on only a few pieces of information) to the global mean (based on ALL samples) and determine if the difference is significant”.  Thus, my concern seems to be unwarranted.

 

In general, the hot spot tool seems to be more useful than the average nearest neighbor tool for the humpback whale data set used here.  Statistically significant clustering of whales occurs with and without consideration of mtDNA for both depth and slope.  Although preliminary, the results from this tool highlight areas for further investigation using additional spatial analysis techniques.

 

Challenges discovered with the ArcGIS Spatial Statistics Toolbox

My biggest challenges using the ArcGIS Spatial Statistics Toolbox are twofold.  First, many of the tools require the use of a numeric variable (either continuous or discrete) and do not support “out of the box” categorical variables, such as mtDNA haplotype.  Thus, in order to look for spatial patterns in haplotypes, I had to split the data up by haplotype, create separate feature classes for each haplotype, and then run the tool several times to get my results.  Given that I was working with a small data set, the repetition was relatively painless but I am certain it would be useful to have this process automated (perhaps using model builder or python scripting).  Not only would this speed up processing but it would also eliminate the addition of human induced error.  Second, the hot spot analysis only allows for the input of one variable at a time.  What if one suspected that the spatial pattern of humpback whales (with or without mtDNA consideration) is related to depth and another environmental variable (e.g. sea surface temperature, productivity or currents)?  I believe this type of analysis would need to be conducted in another software package such as R.

~~~~~~~~~~~~~~~~~~~~~

Baker, C. S., D. Steel, J. Calambokidis, J. Barlow, A. M. Burdin, P. J. Clapham, E. Falcone, J. K. B. Ford, C. M. Gabriele, U. González-Peral, R. LeDuc, D. Matilla, T. J. Quinn, L. Rojas-Bracho, J. M. Straley, B. L. Taylor, J. Urbán Ramírez, M. Vant, P. R. Wade, D. Weller, B. H. Witteveen, K. Wynne, and M. Yamaguchi. 2008. geneSPLASH: an initial, ocean-wide survey of mitochondrial (mt) DNA diversity and population structure among humpback whales on the North Pacific. Final Report for contract 2006-0093-008, submitted to National Fish and Wildlife Foundation.

Calambokidis, J., E.A. Falcone, T. J. Quinn, A. M. Burdin, P. J. Clapham, J. K. B. Ford, C. M. Gabriele, R. LeDuc, D. Mattila, L. Rojas-Bracho, J. M. Straley, B. L. Taylor, J. Urbán, D. Weller, B. H. Witteveen, M. Yamaguchi, A. Bendlin, D. Camacho, K. Flynn, A. Havron, J. Huggins, N. Maloney, J. Barlow, and P. R. Wade. 2008. SPLASH: Structure of Populations, Levels of Abundance and Status of Humpback Whales in the North Pacific. Final report for Contract AB133F-03-RP-00078 from U.S. Dept of Commerce.

 Calambokidis, J., G.H. Steiger, J. M. Straley, L. M., Herman, S. Cerchio, D. R. Salden, U. R.  Jorge, J. K. Jacobsen, O. V. Ziegesar, K. C. Balcomb, C. M. Gabriele, M. E. Dahlheim, S. Uchida, G. Ellis, Y. Mlyamura, P. de guevara Paloma Ladrón, M. Yamaguchi, F. Sato, S. A. Mizroch, L. Schlender, K. Rasmussen, J. Barlow, and T. J. Q. Ii. 2001. Movements and population structure of humpback whales in the North Pacific. Marine Mammal Science. 17:769–794.

 

As we are discovering, there are often things we want to do but ArcGIS is not able to do them.  Esri has created a Tool Gallery for people to share tools they have created when ArcGIS cannot do what they want.   If you are thinking about creating a tool to do something you need, it is worth checking here first so that you don’t have to re-create the wheel.

http://resources.arcgis.com/en/communities/analysis/

http://resources.arcgis.com/gallery/file/geoprocessing

 

 

If you have shapefile or geodatabse feature class that you want to separate into several shapefiles or feature classes based on a specific attribute, you can do so relatively painlessly via XTools Pro.   XTools is an extension that should be loaded onto any OSU owned computer that also has ArcGIS (at least this is the case for all computers in Digital Earth).

Once you have XTools toolbar added to your map, you can find the ‘Split Layer by Attributes’ tool under ‘Feature Conversions’.  Caution: the tool requires the same input and output file types to work correctly (i.e., shapefile –> shapefiles or geodatabase feature class –> geodatabase feature classes).

There are many other useful tools worth exploring in XTools Pro (www.xtoolspro.com).

I am working with a humpback whale dataset collected across the North Pacific from 2004-2006.  Given the large spatial extent, I have selected a subset of data from the Gulf of Alaska (GOA) and would like to look for spatial patterns in the genetic diversity of the whales sighted in the GOA in relation to their environment.  Complicating this problem is the fact that most of the data was collected opportunistically, making the spatial distribution of whale sightings a better reflection of where researchers collected the data and not indicative of whether or not environmental variables influence  humpback whale habitat use.

Splash_All

Figure 1. North Pacific humpback whale sightings from SPLASH.  The data include > 18,000 photo-identification records and 2,700 DNA profiles for 8,000+ unique individuals.

SPLASH_GOA

Figure 2. A subset of the SPLASH data for the Northern and Western Gulf of Alaska. The data subset includes 2,622 records (both photo-identification and DNA profiles) for 1,448 unique individuals.

Ultimately, I need to figure out a method that will allow me to get beyond the uneven (non-systematic) sampling effort to determine if there is any sort of spatial pattern in the data based on genetics and environmental features (i.e. depth, slope, etc).  Two (among many) working hypotheses:

  1. Humpback whales are found in clusters at a particular depth  or slope range.
  2. Humpback whales that share the same haplotype (maternally inherited mitochondrial DNA) cluster together.

The class today discussed topics of interest within the ArcGIS Spatial Statistics toolbox using the Spatial Statistics Blog as a starting point (http://blogs.esri.com/esri/arcgis/2010/07/13/spatial-statistics-resources/).   Most students looked for concepts or tools that would be useful to their specific research needs.  For me, I was interested in the discussion surrounding modeling spatial relationships and analyzing patterns and how this might apply to the humpback whale data I am using for my own project.

Of particular interest was the “Conceptualization of Spatial Relationships” (http://help.arcgis.com/en/arcgisdesktop/10.0/help/#/Modeling_spatial_relationships/005p00000005000000/) webpage.  This concept is important for most of the tools used in the Spatial Stats toolbox and is critical for data in which there is some degree of locational uncertainty – what is the best spatial conceptualization for your data so that the tool output makes sense with your data?

Other interesting points made in class today include:

The discussion on regression and measuring geographic distributions.