May 2013 - GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Lauren suggested me to use the above mentioned tools. Here are what I learned about those tools through ArcGIS10 Help.

Generate Spatial Weights Matrix: Constructs a spatial weights matrix (.swm) file to represent the spatial relationships among features in a dataset.

Generate Network Spatial Weights: Constructs a spatial weights matrix file (.swm) using a Network dataset, defining feature spatial relationships in terms of the underlying network structure.

Note, you have to turn on Network Analyst Extensions to use this tool.

It seems like I have to manually assign the relationship of each network, which sounds like a very cumbersome work as there are more than 100,000 streams to deal with. I may be able to utilize fdr (the output of FlowDirection) to expedite the process.

Stay tuned!

Hello,

This post comes out of a series of discussions around the general topic of “What I’d really like to know about each Spatial Statistic tool?”. The current help section contains a wealth of information, but there are still some lingering questions:

1) In layman terms, what does each tool do?

– There’s a great post in the help section on sample applications. But they’re grouped by spatial question, and then a tool is listed. It’d be wonderful if a similar set of examples of simple-to-understand terms was listed by tool. So, for example, I could look at the incremental spatial correlation explanation and read that it answers questions about “At which distances is spatial clustering most pronounced?

2) What type of data sets are appropriate for use with a given tool and why?

– Each of the tools is based on a mathematical statistic of some kind. You have thoughtfully included a Best Practices section along with most of the tools. But there’s no mention of the reason for each suggestion. I realize this is a big ask, but if there was some explanation of the mathematical theory behind what will go wrong when best practices aren’t followed, it would really help gain a deeper understanding of tool results. For example, for Hot Spot analysis there’s a suggestion to use 30 input features. But why? Is this because the tool is built off the principle of the Central Limit Theorem?
3) A tool-picking flowchart. There are so many great tools out there. What the above questions really deal with come down to a question of “How do I pick a tool?”. I’d love to be able to load up a flowchart that tried to assess my spatial question. Am I concerned with just spatial patterns in and of themselves, or do I want to learn about spatial distribution of values associated with features? Once I find several tools of interest, I’d like to read about what their potential weaknesses are? Will the tool vary greatly if I change sample extent? Will strongly spatially clustered data skew results? Is zero inflation a problem? A lot this is the responsibility of the user to figure out, but it’s these types of questions we’re asking a lot in our class, which often works with non-ideal data sets.Thanks,

– Max Taylor

As we are discovering, there are often things we want to do but ArcGIS is not able to do them. Esri has created a Tool Gallery for people to share tools they have created when ArcGIS cannot do what they want. If you are thinking about creating a tool to do something you need, it is worth checking here first so that you don’t have to re-create the wheel.

http://resources.arcgis.com/en/communities/analysis/

http://resources.arcgis.com/gallery/file/geoprocessing

Thanks to Max for bringing this paper by Mark Dale to our attention:

http://labs.eeb.utoronto.ca/fortin/Dale2002.PDF

I have worked on plotting the observed values of speed and turning angle for each bird versus the time of the day, to see if any of the patterns observed in the Incremental Autocorrelation plots can be traced back to relationships between the individual points. As far as I can see, there doesn’t seem to be none. I am attaching the output for four of my birds, including also an image of the area where they have been moving (where green is forest and pink is agricultural land).

(Note: The point plots correspond to a single day of observations, while the autocorrelation ones were made using all the observation days. I couldn’t run the analysis with the data from single days because they weren’t enough to meet the minimum required by the tool. )

I am thinking that I should do the same type of plot using distance in the X axis rather than time, because there’s not a strict direct relationship between distance moved between two points and time taken to move that distance. Thus, a 30-second time interval between two points could either be reflecting 10 meters or 100 meters.

My new dilemma is that I am not sure what that distance on the X axis should represent. The distance of all points to an arbitrary point (e.g.: site of capture)? The distance along a movement path defined by joining consecutive points? Suggestions are welcome!

If you have shapefile or geodatabse feature class that you want to separate into several shapefiles or feature classes based on a specific attribute, you can do so relatively painlessly via XTools Pro. XTools is an extension that should be loaded onto any OSU owned computer that also has ArcGIS (at least this is the case for all computers in Digital Earth).

Once you have XTools toolbar added to your map, you can find the ‘Split Layer by Attributes’ tool under ‘Feature Conversions’. Caution: the tool requires the same input and output file types to work correctly (i.e., shapefile –> shapefiles or geodatabase feature class –> geodatabase feature classes).

There are many other useful tools worth exploring in XTools Pro (www.xtoolspro.com).

I am working with a humpback whale dataset collected across the North Pacific from 2004-2006. Given the large spatial extent, I have selected a subset of data from the Gulf of Alaska (GOA) and would like to look for spatial patterns in the genetic diversity of the whales sighted in the GOA in relation to their environment. Complicating this problem is the fact that most of the data was collected opportunistically, making the spatial distribution of whale sightings a better reflection of where researchers collected the data and not indicative of whether or not environmental variables influence humpback whale habitat use.

Figure 1. North Pacific humpback whale sightings from SPLASH. The data include > 18,000 photo-identification records and 2,700 DNA profiles for 8,000+ unique individuals.

Figure 2. A subset of the SPLASH data for the Northern and Western Gulf of Alaska. The data subset includes 2,622 records (both photo-identification and DNA profiles) for 1,448 unique individuals.

Ultimately, I need to figure out a method that will allow me to get beyond the uneven (non-systematic) sampling effort to determine if there is any sort of spatial pattern in the data based on genetics and environmental features (i.e. depth, slope, etc). Two (among many) working hypotheses:

Humpback whales are found in clusters at a particular depth or slope range.
Humpback whales that share the same haplotype (maternally inherited mitochondrial DNA) cluster together.

When analyzing data it is important to have a basic familiarity with the data structure. With tabular data this often means creating histograms and scatter plots to visualize the structure and relationship between point values. Also useful are knowing descriptive statistics such as minimum, maximum, mean, and standard deviation values. Familiarity with spatial data should include measures of their geographic dispersion, autocorrelation, and value aggregation. Within ArcGIS these characteristics can be measured using “Average Nearest Neighbor”, “Spatial Autocorrelation (Global Moran’s I)”, and “Hot Spot Analysis Getis-Ord Gi*)” tools, respectively. In this example I look at the spatial structure of a sample of satellite image-mapped forest disturbances in Oregon’s west Cascades. The data are polygons representing unique disturbance events, with attributes including: year of disturbance detection, magnitude of disturbance, and duration.

1. Average nearest neighbor.

Magnitude of disturbance was divided into three classes (low, medium, and high). Each class was run through the average nearest neighbor tool to determine if the spatial pattern is clustered, random, or dispersed. The pattern for low magnitude disturbance is random, whereas medium and high are clustered. This pattern of disturbance severity and its distribution is possibly a function of the disturbance agent. Low magnitude disturbances are typically natural, which may be more random than anthropogenic disturbances, like clearcuts, which dominate the medium and high magnitude classes. Note that nearest neighbor analysis is highly sensitive to the data extent. A larger of smaller extent, would likely change the result, therefore the stated results are only meaningful for the area and extent used, not an indication of universal pattern.

2. Spatial autocorrelation (Global Moran’s I)

Global Moran’s I was applied to disturbance magnitude (without classification based on severity). Global Moran’s I indicated that the disturbances are clustered by magnitude. This means that there is autocorrelation within data, where disturbances close to one another have similar magnitudes. The results are the same as nearest neighbor evaluated by severity classes, except that magnitude was explicit in the analysis with Global Moran’s I (no classification needed). The interpretation is the same as that for nearest neighbor.

3. Hot spot analysis tool (Getis-Ord Gi*)

Getis- Ord Gi* calculates a z-score that relates to the clustering of either high or low valued features. The results, based on the entire range of magnitudes, shows significant clustering of high values, but not of low values, which is consistent with nearest neighbor analysis. The areas showing greatest significance of high magnitude clustering have relatively large gaps between neighbors, which could be a consequence of the “look-to-distance” of the analysis.

An issue that most researchers tend to have is the problem of getting the data. At times our data seems so close yet it is so far away. We as researchers often know what type of data we want and we may also know that it already exists. However, we may not always know how to get the data. Even more frustrating is finding the data that you need and realizing that it is not in a useable form. Finding the correct data in a useable form has been my number one problem. Thankfully a past student has come to my rescue. She suggested using the National Historical Geographical Information System to access census data. The NHGIS site provides, free of charge, aggregate census data and GIS-compatible boundary files for the United States between 1970 and 2011. I intend to carry out a geographical approach to to understand and predict how the local spatial structure of new environmental amenities will influence and shape the way in which environmental justice communities will evolve. This research aims to develop a novel framework/approach to understand the evolution of environmental justice communities in relation to the incorporation and management of natural amenities. To achieve this objective I will complete several benchmark activities including:

Observe spatial and temporal variation and patterns of neighborhood characteristics (educational attainment, income, racial composition, household tenure, renters) over a 70-year period

There are many issues that will arise as I attempt to accomplish this task. For instance, the temporal resolution of my data will be in 10-year increments, this may not entirely capture the patterns that I will be looking for.
Assessing variables temporally will prove to be difficult. For example, educational attainment is a variable that is not available in all years of the census data.
I will also consider how the census tracts and census blocks change over time which could

Quantitatively assess the spatial and temporal variation and patterns of natural amenities over a 70 year period, using satellite imagery and aerial photography.

There is a lot of uncertainty that is associated with using aerial photography and satellite imagery.
One that I considered using to look at green space in an area is to calculate NDVI, which is the Normalized Difference Vegetation Index. In short, it is a remote sensing technique to assess whether the target being observed contains live green vegetation or not
Another technique I am considering is to use an unsupervised k-means classification to explore and assess the change from open/greenspace to impervious surface.

There are a number of things that I still need to consider when trying to carry out this project but, this is a start. My plans for the next week is to continue to explore my data and run some tools that will help to better describe the distribution of certain neighborhood characteristics.

The following screenshots are the results that I have generated using Hot Spot Analysis, Anselin Moran’s and Global Moran’s I to investigate the clustering of soils with high clay content in the six sub-AVAs (Chehalem Mountains, Ribbon Ridge, Dundee Hills, Yamhill-Carlton, McMinnville, and Eola-Amity Hills) of the northern Willamette Valley. I have created quite a few data sets, and am in the process of identifying useful methods for further interogation of my data. Along those lines, I need some feedback regarding the interpretation of these results – any comments would be greatly appreciated.