GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016 « Just another blogs.oregonstate.edu site

Coastal salt marshes are at great risk from a large number of factors, especially climate change and sea-level rise. In an effort with the USGS, I’m working to determine how different salt marshes along the Pacific Coast will response to changes in sea level. Part of our approach is collecting fine-scale, baseline field data in the form of RTK GPS elevation points and vegetation surveys. Through analysis of data from a range of sites (up to 15 along the coast), I hope to better characterize plant habitat requirements with an ultimate goal of producing improved community response projections under sea-level rise scenarios. In this class, and in Jim Graham’s Spatial Modeling/Big Data class, I will be working with the elevation & veg data to characterize spatial relationships of plant species against plot-level factors (inundation frequency, distance to channel, elevation) and site factors (temperature, salinity, tidal range). I have hundreds of vegetation plots per site, with about 2000 survey plots completed across our PNW sites.

Currently, I’m still in data processing mode, combining databases and gathering environmental data; field data collected wrapped up in January. However, the inundation data needs to be developed, first by kriging the elevation data into DEMs and then using site-specific waterlogger data to determine flooding frequency. The water logger data itself needs to be processed for barometric pressure and elevation. Marsh channels need to be digitized before a distance to channel raster can be created. There’s a lot of work still be done to get the data in shape for analysis, however by focusing on one or two sites, I’ll be able to explore the spatial statistics toolbox and push forward with this project.

I am working with hummingbird location points obtained through radiotelemetry, and want to figure out their patterns of space use and how they are affected by forest fragmentation.

I need to find ways of assessing which areas are preferred by the birds as well as the movement patterns they follow.

The points were recorded within a short time period, so they are not independent. Autocorrelation functions will help me evaluate the degree of this dependence in space and time. Rather than a problem, the autocorrelated nature of my data presents an opportunity to study activity patterns of the birds.

My research seeks to quantify and explain patterns of variability as they relate to specific soil properties (such as nutrients, physical structure, ect.) There are patterns in the data itself (distribution shapes such as normal or bimodal, skewness, and variance) and in the spatial distribution of those data values.

I wish to learn more about tools that can characterize these data distributions and spatial patterns (emphasis on spatial for this class). It is especially a challenge because soil variables often don’t follow well-known distribution such as Gaussian or Exponential. This leaves me wary about the use of certain mathematical tools that requires assumptions such as a normal distribution. The central limit theorem does not apply when we move beyond questions about the mean.

At this point I do not have a specific question in mind, and I should also mention I haven’t collected any data for this project yet (I’m just starting lab analysis this spring). I do not have a specific spatial question, rather I’d to learn about various classification and interpolation methods.

Some background info on soil for the curios

A blurb from the Soil Science Society of America on the importance of soil:

Soil provides ecosystem services critical for life; soil acts as a water filter and a growing medium; provides habitat for billions of organisms, contributing to biodiversity; and supplies most of the antibiotics used to fight diseases. Humans use soil as a holding facility for solid waste, filter for wastewater, and foundation for our cities and towns. Finally, soil is the basis of our nation’s agroecosystems which provide us with feed, fiber, food and fuel.

On the source of soil variability:

Soil is HIGHLY heterogeneous. It is a mix of weathered rock minerals, plant organic matter, liquid, and gas. It’s been forming for thousands of years. A multitude of environmental variables affect that formation at spatial scales from nanometer bacterial interaction to varying climate across landscapes. The real challenge is that variability increases as a function of spatial area under consideration. The variability of a 0.5m X 0.5m plot is different than that of a 5m x 5m and is different than a 50m x 50m plot and so on.

A graphical look at shifting soil scale and methods of characterizing variability

My data consists of points derived from a GPS track log, which contains spatial information for GPS points taken at 30-second intervals along with a time stamp for each point. I also have a spreadsheet of field data containing location information for the start and end of an encounter with a species of cetaceans, the time the encounter started and when it ended and other important information such as the species, the number of animals, etc. In order to pair species’ encounters with the GPS tracklog, I use the time information of the encounter and associate those points in the tracklog that correspond with the beginning and ending times.

I am interested in a couple of spatial aspects of this data that are pertinent to this class:

Patterns in the environmental and oceanographic characteristics of the encounter locations that may explain melon-headed whale (Peponocephala electra) utilization of these locations.
The spatial distribution of melon-headed whales and other small cetaceans and the patterns in the presence or absence of melon-headed whales and the presence or absence of other species.

These areas of interest bring up the following spatial statistics related questions:

Do environmental and oceanographic characteristics differ significantly between locations?
Which variables are significant predictors of melon-headed whale utilization of these areas?
Do encounter locations differ significantly from locations where melon-headed whales were not seen?
Is there a relationship between the presence (or absence) of melon-headed whales and the presence of other species of small cetaceans?

I am sure there will be more questions that present themselves once I begin delving into the data.

Today, I solved my first problem (thanks Jen!) and successfully projected my data so that I could begin running spatial statistic analyses. My data went from GCS_WGS_1984 (unprojected) to NAD_1983_StatePlane_Alaska_1_FIPS_5001 which will allow for improved accuracy in spatial calculations for whale sighting data in southeastern Alaska. I ran an Average Nearest Neighbor analysis on humpback whales sightings in southeastern Alaska and found that the observed nearest neighbor distance was significantly smaller than the expected value. This significant difference is most likely due to the complex geography of southeastern Alaska which creates a clustering of individuals. I also learned that results of my spatial statistics analyses will be presented in meters. I look forward to running additional analyses next week!

My spatial problem is that my data were not collected randomly and field efforts were influenced by predicted habitat use or confirmed sightings of whales. Thus, what appear to be hot spots or patterns of habitat use within southeastern Alaska, might actually be areas of increased field effort. This will undoubtedly complicate my analyses and I continue to turn to the Arc Blog (and Dori) for answers.

We have talked about creating a random sample of whales in southeastern Alaska and comparing their patterns of habitat “use” to what we actually have in our data. Stay tuned for more on that…

The data I will be using for this class are water quality data that I have been collecting from 21 locations in the upper Willamette River Basin. The water quality parameters that I will use for this class are dissolved organic carbon and nitrate.

I would like to learn 1) what the ArcGIS add-in toolboxes of SSN & STARS and FLoWS can do, 2) the concepts of each function offered by those toolboxes, and 3) run them on my data to answer one of my questions: are my water quality data varying spatially?

As I was exploring the Spatial Statistics Resources web-page, I quickly realized most of the spatial statistical tools offered by ESRI are not applicable to my project. My project explores spatial and temporal variations of water quality (dissolved organic carbon sources to be precise) in rivers of the Willamette River Basin. Those ESRI spatial statistical tools are not applicable to my project because 1) points are not representing actual observation points of organisms or diseases for my project but rather representing water quality sampling locations that were selected by me and 2) not only Euclidean distance but also in-stream distances, flow directions, and stream networks affect statistical significance.

I found add-in toolboxes for SSN & STARS and FLoWS that address those two issues mentioned above. These toolboxes were developed by the U.S. Forest Service (USFS). Unfortunately the currently available toolboxes are for ArcGIS 9.3, but the USFS states they are planning to publish new toolboxes for ArcGIS10 later this year.

http://webcache.googleusercontent.com/search?q=cache:5SIzWb38eREJ:blogs.esri.com/esri/arcgis/2013/01/29/ssn-stars-tools-for-spatial-statistical-modeling-on-stream-networks/+spatial+statistics+arcgis+water&cd=1&hl=en&ct=clnk&gl=us

Things I would like to accomplish by the next class period are to 1) download those two toolboxes and 2) see if they seem to work with ArcGIS10. Note, I am not planning on publishing data modified using those toolboxes developed for ArcGIS 9.3; however, these goals will help me explore what kinds of tools are available through these toolboxes and learn the concept of tools that I am interested in using.

As a general introduction to what I can expect from spatial statistics I searched for a webpage that would define what spatial statistics are, what kinds of questions they can answer, and how they are different from a-spatial statistics. I found a document entitled “Understanding Spatial Statistics in ArcGIS 9” (http://www.utsa.edu/lrsg/Teaching/EES6513/ESRI_ws_SpatialStatsSlides.pdf) that answers these questions.

The document begins by answering the question “What are spatial statistics?” The author defines them as “exploratory tools that help you measure spatial processes, spatial distributions, and spatial relationships.”

There are two categories of spatial measurements:

1) Identifying characteristics of a distribution. This first category of measurements is descriptive, answers questions like: where is the center, or how are the features distributed around the center?

2) Quantifying geographic pattern ie are the data random, clustered, or evenly dispersed.

Spatial statistics are different from a-spatial or non-spatial statistics in that spatial statistics include some measure of space in there mathematics. In most cases, neighboring observations are considered in the statistics regarding a focal observation or global measurement.

The document describes a few examples of problems or questions addressed using spatial statistics available in ArcGIS:

1) How does the distribution of Dengue Fever for a village in India change during the first three weeks after the outbreak?

2) Does bobcat movement between preferred habitat areas coincide with natural land features such as valleys, rivers, or ridgelines?

3) Are there persistent areas in the United States where people are either dying earlier, or living longer, than the average American?

Areas of interest

1. Using ModelBuilder to manage data downloaded from the Internet.

http://www.arcgis.com/home/item.html?id=7180ba6e9d8845128eaadf70a4b6bf7e

This tutorial piqued my interest because my data will come from a variety of sources. I will likely encounter a variety of formatting, labeling, and quality differences among datasets so standardizing the process would be beneficial. This tutorial illustrates some of the pertinent considerations, such as no spaces in field names, when importing data into ArcGIS as well as how to use ModelBuilder to plan and automate tasks.

2. Using R in ArcGIS 10.

http://www.arcgis.com/home/item.html?id=a5736544d97a4544aa47d06baf910f6d

Extending ArcGIS with R – presentation from the 2010 Users Conference

http://www.arcgis.com/home/item.html?id=547085ee428f4141b2cacb338f8f61a3

Since ArcGIS can experience limited functionality working with large datasets and spatial statistics needs can extend beyond its capabilities, being able to integrate with software that is more capable, such as R, could be very useful.

To Do:

I am still early in my thesis development but one of the things that I would like to investigate is habitat use of melon-headed whales around French Polynesia and compare that to habitat use around other islands. I would like to continue to investigate the spatial statistics tools that are out there and see what the best approach will be for my project.
I am also interested in looking at spatial distributions of small cetaceans in the Pacific and test for relationships between these distributions and the presence or absence of melon-headed whales. So again, investigation into the spatial statistics relevant to this type of study is on the to do list.

Regression analysis can help you dive deeper into the spatial relationships and the factors behind spatial patterns. At a slightly more advanced level, regression analysis can help you make predictions based on your data. The ArcGIS Resource Center has a very nice page called “Regression Analysis Basics” and gives users an introduction to both regression and the related tools available. It notes the different components of models such as dependent and independent variables and regression coefficients. One of my favorite components of the page is the table “Common regression problems, consequences, and solutions”. This lists problems and links to solutions that could potentially help you make your regression model stronger. Even if your skill set is beyond the basics of regression analysis, this page is a good refresher and introduction to how Arc can aid in telling a story.

Another helpful page is titled “What they don’t tell you about regression analysis”. Whatever you are trying to model is likely a complex phenomenon (especially in this class) and may not have a simple set of answers. Models often need revision and Arc has created a step-by-step protocol for increasing the validity of your analysis and model; this page guides you through six questions/check-marks that you’ll want to pass before you can have confidence in your model.

In my data, for example, I have several layers that could potentially help me identify where wetlands lie within the valley; examples include elevation, hydrology (stream and flood inundation), vegetation, and soils. Often, GIS users simply stack these layers together and create polygons based on areas that contain all, or a majority of layers. This technique may be based in ecologically sound logic, but does not address the strength between layers or the degree to which one or more layers may influence (both positively and negatively) others.
A regression analysis using known areas of wetland as the dependent variable and a variety of GIS layers as explanatory variables could help me predict places where wetlands are located but may not have been mapped. Or, even better, it could help me predict where wetlands were in the past. The two pages listed above are useful in guiding me through making a model through the individual decisions I need to make. For example, using Ordinary Least Squares versus Geographically Weighted Regression.

Take a look at the two introduction pages and consider if your data could be used in a regression analysis and if the tools available in the Spatial Statistics toolbox could be useful. You could even just bring three different variables (ex: hydro, soils, and elevation) to try out.
There are three resources to explore further if you’re interested in using your data to perform regression analysis:

Lauren Scott’s presentation on regression analysis
The seminar on regression analysis titled “Beyond Where: Using Regression Analysis to Explore Why“
The regression analysis tutorial (the same used in Scott et al.’s presentation) where you can “Learn how to build a properly specified OLS model and improve that model using GWR, interpret regression results and diagnostics, and potentially use the results of regression analysis to design targeted interventions”

GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Just another blogs.oregonstate.edu site

Salt Marsh Veg Community Response to Sea-Level Rise

Studying hummingbird movement

Soil Variability At Multiple Spatial Scales

Environmental/oceanographic characteristics, habitat utilization, and species distribution

Non-random field effort

My data and objective

Spatial Statistics for Stream Network

Spatial Statistics Introduction

ModelBuilder for automating tasks and using ArcGIS with R

Regression Analysis and Modeling Using the Spatial Statisitcs Toolbox

Contact Info