Areas of interest

1. Using  ModelBuilder  to manage data downloaded from the Internet.

http://www.arcgis.com/home/item.html?id=7180ba6e9d8845128eaadf70a4b6bf7e

This tutorial piqued my interest because my data will come from a variety of sources. I will likely encounter a variety of formatting, labeling, and quality differences among datasets so standardizing the process would be beneficial. This tutorial illustrates some of the pertinent considerations, such as no spaces in field names, when importing data into ArcGIS as well as how to use ModelBuilder to plan and automate tasks.

2. Using R in ArcGIS 10.

http://www.arcgis.com/home/item.html?id=a5736544d97a4544aa47d06baf910f6d

Extending ArcGIS with R – presentation from the 2010 Users Conference

http://www.arcgis.com/home/item.html?id=547085ee428f4141b2cacb338f8f61a3

Since ArcGIS can experience limited functionality working with large datasets and spatial statistics needs can extend beyond its capabilities, being able to integrate with software that is more capable, such as R, could be very useful.

To Do:

  1. I am still early in my thesis development but one of the things that I would like to investigate is habitat use of melon-headed whales around French Polynesia and compare that to habitat use around other islands. I would like to continue to investigate the spatial statistics tools that are out there and see what the best approach will be for my project.
  2. I am also interested in looking at spatial distributions of small cetaceans in the Pacific and test for relationships between these distributions and the presence or absence of melon-headed whales. So again, investigation into the spatial statistics relevant to this type of study is on the to do list.

Regression analysis can help you dive deeper into the spatial relationships and the factors behind spatial patterns. At a slightly more advanced level, regression analysis can help you make predictions based on your data. The ArcGIS Resource Center has a very nice page called “Regression Analysis Basics” and gives users an introduction to both regression and the related tools available. It notes the different components of models such as dependent and independent variables and regression coefficients. One of my favorite components of the page is the table “Common regression problems, consequences, and solutions”.  This lists problems and links to solutions that could potentially help you make your regression model stronger. Even if your skill set is beyond the basics of regression analysis, this page is a good refresher and introduction to how Arc can aid in telling a story.

Another helpful page is titled “What they don’t tell you about regression analysis”. Whatever you are trying to model is likely a complex phenomenon (especially in this class) and may not have a simple set of answers. Models often need revision and Arc has created a step-by-step protocol for increasing the validity of your analysis and model; this page guides you through six questions/check-marks that you’ll want to pass before you can have confidence in your model.

In my data, for example, I have several layers that could potentially help me identify where wetlands lie within the valley; examples include elevation, hydrology (stream and flood inundation), vegetation, and soils. Often, GIS users simply stack these layers together and create polygons based on areas that contain all, or a majority of layers. This technique may be based in ecologically sound logic, but does not address the strength between layers or the degree to which one or more layers may influence (both positively and negatively) others.
A regression analysis using known areas of wetland as the dependent variable and a variety of GIS layers as explanatory variables could help me predict places where wetlands are located but may not have been mapped.  Or, even better, it could help me predict where wetlands were in the past. The two pages listed above are useful in guiding me through making a model through the individual decisions I need to make. For example, using Ordinary Least Squares versus Geographically Weighted Regression.

Take a look at the two introduction pages and consider if your data could be used in a regression analysis and if the tools available in the Spatial Statistics toolbox could be useful. You could even just bring three different variables (ex: hydro, soils, and elevation) to try out.
There are three resources to explore further if you’re interested in using your data to perform regression analysis:

  1. Lauren Scott’s presentation on regression analysis
  2. The seminar on regression analysis titled “Beyond Where: Using Regression Analysis to Explore Why
  3. The regression analysis tutorial (the same used in Scott et al.’s presentation) where you can “Learn how to build a properly specified OLS model and improve that model using GWR, interpret regression results and diagnostics, and potentially use the results of regression analysis to design targeted interventions”

 

For my “first take” on the Spatial Statistics Resources blog, I learned more about the mathematical statistics contained within the tools of the Spatial Statistics toolbox. I quickly realized that the tools can be grouped by common mathematical principle. For example, all hot spot identification is found using something called the Getis-Ord Gi* statistic. Looking at the Desktop 10 Help website list of sample applications, most tools are listed with an associated mathematical statistic (usually listed in parentheses). For example:

Question: Is the data spatially correlated?

Tool: Spatial Autocorrelation (Global Moran’s I)

Some of the mathematical concepts I am fairly well acquainted with, like ordinary least squares. Others I had never heard of. The Getis-Ord statistic is one I’d never encountered before. I used one of my primary research tools, the internet, and found the statistic was developed in the mid-nineties by the method’s namesake statisticians.

Link to the 1995 paper on the Getis-Ord statistic

But one need not always consult the internet at large. ESRI provides some explanation of each tool in various articles scattered around the Spatial Statistics folder from Desktop 10.0 Help. I’ve begun assembling a list with the link to each math principle/tool/statistics below. I would like to learn about these statistics, what their strengths and weaknesses are, and especially when it is not appropriate to use them (what are the assumptions?).

List of Mathematical Principles/Statistics Underlying the Suite of Available Spatial Statistics

Analyzing Patterns:

How Multi-Distance Spatial Cluster Analysis (Ripley’s K-function) works

How Spatial Autocorrelation (Global Moran’s I) works

How High/Low Clustering (Getis-Ord General G) works

Mapping Clusters:

How Hot Spot Analysis (Getis-Ord Gi*) works

How Cluster and Outlier Analysis (Anselin Local Moran’s I) works

Measuring Geographic Distributions:

How Directional Distribution (Standard Deviational Ellipse) works

Modeling Spatial Relationships:

Geographically Weighted Regression (GWR) (Spatial Statistics)

Ordinary Least Squares (OLS) (Spatial Statistics)

 

The class today discussed topics of interest within the ArcGIS Spatial Statistics toolbox using the Spatial Statistics Blog as a starting point (http://blogs.esri.com/esri/arcgis/2010/07/13/spatial-statistics-resources/).   Most students looked for concepts or tools that would be useful to their specific research needs.  For me, I was interested in the discussion surrounding modeling spatial relationships and analyzing patterns and how this might apply to the humpback whale data I am using for my own project.

Of particular interest was the “Conceptualization of Spatial Relationships” (http://help.arcgis.com/en/arcgisdesktop/10.0/help/#/Modeling_spatial_relationships/005p00000005000000/) webpage.  This concept is important for most of the tools used in the Spatial Stats toolbox and is critical for data in which there is some degree of locational uncertainty – what is the best spatial conceptualization for your data so that the tool output makes sense with your data?

Other interesting points made in class today include:

The discussion on regression and measuring geographic distributions.

TOOL: Generate Network Spatial Weights

URL: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//005p0000001z000000

Toolset: Modeling Spatial Relationships

Summary: This tool allows the analysis of the spatial relationship between features whose connections are restricted to a network. This means that the movement between two points can only take place through specific routes. Consequently, if one wants to analyze the shortest distance between two points, the Euclidean (straight-line) might not be the appropriate measurement.

The Generate Network Spatial Weights generates a spatial weight matrix which quantifies the relationship between features based on their neighboring relationships and under the restriction of a network dataset.

 

TOOL: Linear Directional Mean

URL: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//005p00000017000000

Toolset: Measuring Geographic Distributions

Summary: This tool measures the trend of a set of lines to identify their mean direction, length, and geographic center. It can calculate mean direction and/or mean orientation. In the first case, the start and end points of the lines matter; in the second, they don’t.

The output of the Linear Directional Mean is a single line centered on the calculated mean center, length equal to the mean length and direction (or orientation) equal to the mean direction (or mean orientation) all input vectors.

 

VIDEO: Performing Proper Density Analysis

http://video.arcgis.com/watch/401/performing-proper-density-analysis

Duration: 12:11min

Summary: The purpose of this video is to explain importance of user-decisions (such as input parameters) when performing a density analysis and generate awareness of the existence of subjective aspects of the results.

Since one of my interest in this class is focused on understanding the underlying principles behind ArcGIS’s spatial statistics tools, I was interested in this link (http://blogs.esri.com/esri/arcgis/2010/04/07/check-out-our-chapter-on-spatial-statistics-in-arcgis-in-the-handbook-of-applied-spatial-analysis/) to additional materials related to spatial statistics. The chapter published by Lauren Scott and Mark Janikas with the Handbook of Applied Spatial Analysis, edited by Manfred M. Fischer and Arthur Getis, relates directly to the tools in the Spatial Statistics toolbox provided by Esri within ArcGIS. I’m interested in trying to track down a copy of this handbook and seeing what topics are discussed, see how others are using spatial statistics, as well as learning more about the underlying principles and ideas behind the spatial statistics used. I will send Julia Jones the information about the book so she can request the book through the Valley Library.

In my initial exploration of the ESRI spatial statistics website, I focused on tools that might be useful in my proposed research of population structure and behavioral ecology of humpback whales (Megaptera novaeangliae) in Glacier Bay/Icy Strait, Alaska. One objective of my master’s thesis is investigating the mechanisms of population increase within Glacier Bay/Icy Strait, Alaska since the early 1970s/1980s. I was initially struck by the hot spot analysis, thinking it might be informative to visualize habitat use of humpback whales within Glacier Bay/Icy Strait. This region has undergone massive geological change in the past decades and has become deglaciated relatively recently, i.e. over the past 200 years. Visualizing the habitat use (depth, slope, distance from shore, etc.) of the contemporary population of humpback whales in Glacier Bay/Icy Strait might help inform why there has been an increase in abundance in this region. This would be done by importing layers of oceanographic features under humpback whale encounters to detect patterns of habitat use.

Links:

How to do it:

http://resources.arcgis.com/gallery/file/geoprocessing/details?entryID=604B4BD9-1422-2418-A0F3-77076337D488

http://www.arcgis.com/home/item.html?id=dea008bcc77d4fd485abdf8121190b82

How it works:

http://help.arcgis.com/en/arcgisdesktop/10.0/help/#/How_Hot_Spot_Analysis_Getis_Ord_Gi_works/005p00000011000000/

TO DO: After visualizing my humpback whale encounters in ArcGIS, it occurred to me that what appear to be hot spots within Glacier Bay/Icy Strait, might actually be areas of increased field effort. My data was not collected using random transect lines and thus, this is going to complicate any potential hotspot analysis.

The class reviewed the content of ESRI’s ArcGIS spatial statistics blog and reported on areas of interest and potential future use from each student’s perspective.  With regards to statistical predictions involving three dimensional problems spanning the subsurface, groundwater, surface and atmospheric systems, one challenge is how to use ArcGIS statistics to evaluate the connectivity and interaction of these systems to predict or estimate relationships between them.

ArcMarine has a 3-D component but is still under development and does not directly address the issues of spatial statistical analyses of 3D systems.

Jen’s identification of the spatial statistics in ArcGIS handbook was interesting and may be useful for identifying tools and analyses that are appropriate.

Peggy’s identification of the externally developed tool for statistically evaluating flow through networks (rivers and streams) may also be useful.  See her post for link to this tool.

Dori’s discussion on identification of generating network spatial weights is also relevant to our approach and something Jen and I have utilized previously for our research.

Finally, evaluating further the tools available in the “Assess Overall Spatial Patterns” and the “Model Relationships” portion of the ArcGIS blog also look prospective and worthy of further investigation.

The two posts I found that I may benefit the most from are:

  1. Supplemental Spatial Statistics Toolbox: http://www.arcgis.com/home/item.html?id=694e0f97355740d7bba6b8b356c0b925

The tools for integrated spatial autocorrelation and exploratory regression analysis seem like they would be useful for investigating spatial relationships and identifying important response variables for spatial models.

  1. Integrating R and ArcGIS:

http://www.arcgis.com/home/item.html?id=a5736544d97a4544aa47d06baf910f6d

I’ve spent much more time in R running spatial models than in Arc, having to bring model outputs into Arc for mapping after the analysis is complete.  For more complex models, this is probably still the most efficient method, but for simpler analysis it may be easier to run the analysis and produce maps in Arc.

I also found the regression analysis pages very useful as a reference, in addition to the page on ‘Finding a Meaningful Model’ http://www.esri.com/news/arcuser/0111/findmodel.html .  The tutorials for hot spot, regression analysis, and model builder seem like they would be worthwhile to run through and of general benefit to others in the class.

Kevin Buffington