Uncategorized Archives - Page 4 of 6 - GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

As described in my previous blog post, my original intent was to investigate successional patterns of urban re-colonization by diurnal raptors using eBird data. I soon realized that Exercises 2 and 3 would not be possible with point data that I have because the data alone do not have any quantitative attributes that vary in any meaningful way in relation to my research question. After discussing other potential research questions with Julia, we decided that analyzing differing patterns of year-round residency across an urban/rural gradient would be a reasonable alternative that still addresses some of the same overarching themes.

My general approach to conducting this analysis was to calculate the ratio of the number of days a particular species was observed within a given year to the number of days observations of any species were made. This ratio is calculated per cell in a raster covering the extent of the study area. If the ratio is closer to 1 in certain places, then that species of bird is likely staying in that area for more of the year. A value of 0 would indicate that at least one other species was observed, but the species of interest was not observed at all at that location.

I initially thought I would get more meaningful results with a larger sample size so for this initial analysis, I chose to analyze the residency of red-tailed hawks (n = 6,607), one of the most commonly reported species of raptor. The year-round range of red-tailed hawks, however, overlaps with my study site in NW Oregon which might have influenced my results. I therefore repeated the analysis with observations of merlins (n = 589), a species without a year-round range overlapping the study site, to see if patterns in the data were different. Of course, merlins and red-tailed hawks may respond to urban and rural environmental characteristics differently. For both species, I used observations from 2014 only.

An overview of the workflow for this analysis is shown below , but I will also describe each step in detail.

Step 1 Dissolve all observations dataset and species observation dataset by date and observer ID fields.

Step 2 Point Density on both sets of dissolved observations with a cell size of 200 m and a neighborhood of 1 cell

Step 3 Raster Caculator on both points density rasters to multiply each by 200. This produces a raster where each cell is a count of the number of points within that cell. (Not shown in workflow schematic due to space limitations.)

Screen Shot 2015-04-14 at 10.22.32 PM — Observations of all species per 200 m

Screen Shot 2015-04-14 at 10.23.16 PM — Red-tailed hawk observations per 200 m

Step 4 Raster Calculator to compute raster where each cell is a ratio of species observation count to total observation count

Step 5 Extract Values to Points with all observation dataset as input points and ratio raster as values to sample.

Step 6 Getis-Ord Gi* Hotspot analysis on sampled points with Fixed Distance Band as the Conceptualization of Spatial Relationships parameter.

Screen Shot 2015-04-14 at 11.21.53 PM — Hot and cold spots of red-tailed hawk residency. Black polygons are 2010 Census delineated urbanized areas.

While the results seem to reflect a pattern I would expect, I’m not sure that I trust them entirely. This is for several reasons:

The merlin hotspot map is very similar despite the fact that the observations were much more sparse than red-tailed hawk observations. The merlin map also shows hotspots in locations where there aren’t any merlin observations too, and the sampled value of many of the points is 0 (meaning other birds were observed but no merlins were).

Screen Shot 2015-04-14 at 11.37.19 PM — Merlin hotspots

I conducted the same sampling procedures using Extract Values to Points but with a grid of points even spaced 1km apart. The hotspot map is very different and seems to be mostly noise. There is only faint evidence of a discernible pattern.

Screen Shot 2015-04-14 at 11.42.56 PM — Hot spot analysis on a 1 km grid of sampled red-tailed hawk residency ratio

There are some data quality issues that I did not address here. For instance, some observers may have only been looking for certain species and might not have reported the species of interest even if it were present. The converse could be true as well. This is usually reported with those kinds of observations but I didn’t filter these out.
Also, the data are very clustered to begin with and I may not have selected the right Conceptualization of Spatial Relationships for the data.

This week I began exploring the tools and techniques I proposed using in “My Spatial Problem” blog. The goal of my project is to investigate the foraging ranges of Adelie and Gentoo penguins over the course of a breeding season on the Western Antarctic Peninsula. Specifically, I’d like to calculate the total area each species utilizes for foraging, identify core foraging areas, and calculate the percent overlap between the ranges of these two species.

I began this process by importing XY Data (latitudinal and longitudinal coordinates in decimal degrees) into ArcMap. To start I’ve randomly chosen data from three Adelie and three Gentoo penguins, representing at-sea areas where PTT tags were able to successfully transmit location data to a satellite.

I utilized an online database called the Antarctic Digital Database (©1993-2015 Scientific Committee on Antarctic Research) to obtain basemaps for general orientation and reference. They aren’t perfect, but useful enough to show the general location of each species’ breeding colony, and the location of a nearby marine canyon that may be of ecological importance. I found these basemaps to be important right away. They aren’t directly necessary for spatial analysis, but they are critical in terms of initially assessing the space these species are utilizing and determining whether things make sense! I discovered two important things.

In my first blog post, I briefly glossed over the importance of filtering these datapoints to eliminate poor quality data. Each datapoint (downloaded from ARGOS) comes with an associated estimation of accuracy. I decided to initially skip this step while I use this data to practice spatial analysis in Arc. This explains why some datapoints are on land, and it might explain outliers, like those seen in the Neumayer Channel (unlabeled, but at right in Figure 1 and 2 below).
Upon visually inspecting these datapoints I realized there must be something wrong, because my initial map of Gentoo foraging locations showed a lot of clustering around Torgersen Island (Figure 1). Torgersen is the location of the Adelie colony, and Biscoe is the location of the Gentoo colony. It was unusual that the points of two individuals did not originate from Biscoe. In fact, the original data were from Adelie’s and were mixed up in the original datasource. This is corrected in Figure 2.

Figure 1. Erroneous Gentoo locations originating from Torgersen Island

Figure 2. Corrected Gentoo locations originating from Biscoe Island

Next, I began to research kernel density estimation techniques. Much of the literature I’ve read where similar techniques have been used has alluded to kernel density estimation techniques, percent volume contours, and, a spatial analyst extension called Animal Movement. I was dismayed to find out that the Animal Movement Extension is no longer in commission and not available for Arc 9 and 10. The next extension/software I researched that provided these tools was called Hawths tools, and is also discontinued. Its replacement is called Geospatial Modeling Environment (http://www.spatialecology.com/gme/). I am still considering using this software, however it would require learning/using an entirely different program.

While considering these things, I attempted to search for tools in the ArcToolbox that might be useful. I used the Kernel Density tool to create kernel density layers for each species. I combined the individual datapoints from each Adelie into one layer (Fig 3), and the three Gentoo individuals into another (Fig 4), and then calculated separate kernel density layers for each. It was encouraging to find that the output of this was a pretty good visual representation of “hotspots”, however, I’ve since been stuck attempting to understand exactly what Arc did here. Specifically, I don’t understand the values that are associated with each contour.

Figure 3. Combined Adelie locations with kernel density layer, note grid-like structure of the points in the center

Figure 4. Combined Gentoo locations with kernel density layer and legend at left describing kernel density values for Fig 3 and 4

If I can figure this out I will be able to determine whether this tool will work for the purpose of my project. I’d like to determine the area within 50% and 95% contour lines. To do this I need to accurately create these contours, and this will require more knowledge about how the kernel density tool works. So far I’ve experimented changing different things associated with “Classification” in the Symbology tab of Layer Properties. Break Values seem to determine each contour, and there is an option to change these values. There is an option to specify %, but the units/area calculated by the % values do not seem right (Figure 5). The legend contains the values associated with 25, 50 and 95% breakpoints (0-25, 25-50, 50-95). I will continue to explore this function, as well as the Geospatial Modeling Environment program described above.

Looking closely at the Adelie datapoints (Fig 3) it appears that they are way too grid-like. It turns out that the original XY data (decimal degrees) is only to four decimal places. Eventually I will need to return to the original datasource for more fine-scale points (hopefully they exist).

My next steps include deciphering the kernel density output, and learning how changing factors such as grid cell size and search radius affect kernel density calculations. After that I will need to determine which tool/calculation will allow me to compute % overlap between the two species ranges.

Figure 5. Adelie locations with a kernel density layer where breakpoints were manually entered, pink is supposed to represent 25-50% and blue is supposed to represent 50-75%?

Lauren suggested me to use the above mentioned tools. Here are what I learned about those tools through ArcGIS10 Help.

Generate Spatial Weights Matrix: Constructs a spatial weights matrix (.swm) file to represent the spatial relationships among features in a dataset.

Generate Network Spatial Weights: Constructs a spatial weights matrix file (.swm) using a Network dataset, defining feature spatial relationships in terms of the underlying network structure.

Note, you have to turn on Network Analyst Extensions to use this tool.

It seems like I have to manually assign the relationship of each network, which sounds like a very cumbersome work as there are more than 100,000 streams to deal with. I may be able to utilize fdr (the output of FlowDirection) to expedite the process.

Stay tuned!

I have worked on plotting the observed values of speed and turning angle for each bird versus the time of the day, to see if any of the patterns observed in the Incremental Autocorrelation plots can be traced back to relationships between the individual points. As far as I can see, there doesn’t seem to be none. I am attaching the output for four of my birds, including also an image of the area where they have been moving (where green is forest and pink is agricultural land).

(Note: The point plots correspond to a single day of observations, while the autocorrelation ones were made using all the observation days. I couldn’t run the analysis with the data from single days because they weren’t enough to meet the minimum required by the tool. )

I am thinking that I should do the same type of plot using distance in the X axis rather than time, because there’s not a strict direct relationship between distance moved between two points and time taken to move that distance. Thus, a 30-second time interval between two points could either be reflecting 10 meters or 100 meters.

My new dilemma is that I am not sure what that distance on the X axis should represent. The distance of all points to an arbitrary point (e.g.: site of capture)? The distance along a movement path defined by joining consecutive points? Suggestions are welcome!

When analyzing data it is important to have a basic familiarity with the data structure. With tabular data this often means creating histograms and scatter plots to visualize the structure and relationship between point values. Also useful are knowing descriptive statistics such as minimum, maximum, mean, and standard deviation values. Familiarity with spatial data should include measures of their geographic dispersion, autocorrelation, and value aggregation. Within ArcGIS these characteristics can be measured using “Average Nearest Neighbor”, “Spatial Autocorrelation (Global Moran’s I)”, and “Hot Spot Analysis Getis-Ord Gi*)” tools, respectively. In this example I look at the spatial structure of a sample of satellite image-mapped forest disturbances in Oregon’s west Cascades. The data are polygons representing unique disturbance events, with attributes including: year of disturbance detection, magnitude of disturbance, and duration.

1. Average nearest neighbor.

Magnitude of disturbance was divided into three classes (low, medium, and high). Each class was run through the average nearest neighbor tool to determine if the spatial pattern is clustered, random, or dispersed. The pattern for low magnitude disturbance is random, whereas medium and high are clustered. This pattern of disturbance severity and its distribution is possibly a function of the disturbance agent. Low magnitude disturbances are typically natural, which may be more random than anthropogenic disturbances, like clearcuts, which dominate the medium and high magnitude classes. Note that nearest neighbor analysis is highly sensitive to the data extent. A larger of smaller extent, would likely change the result, therefore the stated results are only meaningful for the area and extent used, not an indication of universal pattern.

2. Spatial autocorrelation (Global Moran’s I)

Global Moran’s I was applied to disturbance magnitude (without classification based on severity). Global Moran’s I indicated that the disturbances are clustered by magnitude. This means that there is autocorrelation within data, where disturbances close to one another have similar magnitudes. The results are the same as nearest neighbor evaluated by severity classes, except that magnitude was explicit in the analysis with Global Moran’s I (no classification needed). The interpretation is the same as that for nearest neighbor.

3. Hot spot analysis tool (Getis-Ord Gi*)

Getis- Ord Gi* calculates a z-score that relates to the clustering of either high or low valued features. The results, based on the entire range of magnitudes, shows significant clustering of high values, but not of low values, which is consistent with nearest neighbor analysis. The areas showing greatest significance of high magnitude clustering have relatively large gaps between neighbors, which could be a consequence of the “look-to-distance” of the analysis.

An issue that most researchers tend to have is the problem of getting the data. At times our data seems so close yet it is so far away. We as researchers often know what type of data we want and we may also know that it already exists. However, we may not always know how to get the data. Even more frustrating is finding the data that you need and realizing that it is not in a useable form. Finding the correct data in a useable form has been my number one problem. Thankfully a past student has come to my rescue. She suggested using the National Historical Geographical Information System to access census data. The NHGIS site provides, free of charge, aggregate census data and GIS-compatible boundary files for the United States between 1970 and 2011. I intend to carry out a geographical approach to to understand and predict how the local spatial structure of new environmental amenities will influence and shape the way in which environmental justice communities will evolve. This research aims to develop a novel framework/approach to understand the evolution of environmental justice communities in relation to the incorporation and management of natural amenities. To achieve this objective I will complete several benchmark activities including:

Observe spatial and temporal variation and patterns of neighborhood characteristics (educational attainment, income, racial composition, household tenure, renters) over a 70-year period

There are many issues that will arise as I attempt to accomplish this task. For instance, the temporal resolution of my data will be in 10-year increments, this may not entirely capture the patterns that I will be looking for.
Assessing variables temporally will prove to be difficult. For example, educational attainment is a variable that is not available in all years of the census data.
I will also consider how the census tracts and census blocks change over time which could

Quantitatively assess the spatial and temporal variation and patterns of natural amenities over a 70 year period, using satellite imagery and aerial photography.

There is a lot of uncertainty that is associated with using aerial photography and satellite imagery.
One that I considered using to look at green space in an area is to calculate NDVI, which is the Normalized Difference Vegetation Index. In short, it is a remote sensing technique to assess whether the target being observed contains live green vegetation or not
Another technique I am considering is to use an unsupervised k-means classification to explore and assess the change from open/greenspace to impervious surface.

There are a number of things that I still need to consider when trying to carry out this project but, this is a start. My plans for the next week is to continue to explore my data and run some tools that will help to better describe the distribution of certain neighborhood characteristics.

The following screenshots are the results that I have generated using Hot Spot Analysis, Anselin Moran’s and Global Moran’s I to investigate the clustering of soils with high clay content in the six sub-AVAs (Chehalem Mountains, Ribbon Ridge, Dundee Hills, Yamhill-Carlton, McMinnville, and Eola-Amity Hills) of the northern Willamette Valley. I have created quite a few data sets, and am in the process of identifying useful methods for further interogation of my data. Along those lines, I need some feedback regarding the interpretation of these results – any comments would be greatly appreciated.

Percent clay Location Map of the entire Willamette Valley AVA

Percent clay of the entire Willamette Valley AVA (including the six sub-AVAs in the northern portion of the Willamette Valley)

Percent Clay detail of the northern Willamette Valley

Hot Spot Analysis (GiZScore) of Percent Clay; detailed

Hot Spot Analysis (GiPValue) of Percent Clay; detailed

Anselin Moran’s (Cluster/Outlier Type) of Percent Clay; detailed

Anselin Moran’s (LMiZScore) of Percent Clay; detailed

Anselin Moran’s (LMiPValue) of Percent Clay; detailed

Global Moran’s I using a fixed distance of 1,000 meters, 5,000 meters, 10,000 meters, and 15,000 meters

My objective was to see if the displacement of the birds showed particular patterns. For this, I decided to analyze the distribution of speed and rotation angles in space. Speed at a particular point is calculated as distance to previous point over time taken to move between points. Rotation angle refers to the angle between two consecutive movement lines (i.e., lines joining point A to B and B to C).
I first tried the Spatial Autocorrelation function, which indicated a clustered distribution of the values.

Example of output of the Spatial Autocorrelation tool applied to rotation angles.

These results weren’t meaningful for me though, as I was interested in the variability within the observations. Studies on different animals species have shown that the analysis of variability within movement patterns can be used to infer behavioral patterns. I expected the birds would show varying speeds and rotation angles in response to the habitat where they were living (e.g., move slower inside the forest and quicker between forest patches; straighter movement lines in non-forest habitat). Thus, I decided to apply the Incremental Spatial Autocorrelation function, as this tool would indicate if the spatial clustering of values varied in the study area.

The results show mixed responses from each bird, with no clear interpretation for the observed patterns.

Incremental Spatial Autocorrelation (Speed) — Example of output of the Incremental Spatial Autocorrelation tool applied to speed.

Incremental Spatial Autocorrelation (Angle) — Example of output of the Incremental Spatial Autocorrelation tool applied to rotation angles.

Most of them have non-significant z-scores, and those that do have no clear relationship to any environmental factor. Hot spot analyses don’t show a particular concentration of values at any point either.

Hot Spot Analysys — Example of output of the Hot Spots tool as applied to rotation angles.

In conclusion, speed and rotation angles are either A) not affected by the disposition of forest or B) bad indicators of behavioral changes associated to space use.

For this weeks assignment, we were tasked to begin exploring our dataset with some basic exploratory spatial statistics tools from ArcGIS (average nearest neighbor or/and spatial autocorrelation and hot-spot analysis). Since, my underlying problem is to interpret subsurface geologic characteristics throughout the Northern Gulf of Mexico to fill spatial gaps, I need to understand both the distribution of my sampling points (n=13625) as well as the spatial distribution of the subsurface geologic characteristics associated with each sampling point, such as average porosity, initial temperature (°F), and initial pressure (psi). Therefore, to get at the initial distribution of my sampling points, I used average nearest neighbor to identify if the distribution of my sampling points tended to be clustered, random, or dispersed. Results (table 1) showed that my sampling points were significantly clustered, which verifies with the patterns observed visually (figure 1).

Table 1. Resulting z score and p value from average nearest neighbor and spatial autocorrelation tests for entire sampling data and subsampled datasets

Gulf of Mexico data samples — Figure 1. Location of sampling points (boreholes; n=13625) throughout the Northern Gulf of Mexico

However, since I know the distribution of my sampling points, how could I be sure that the spatial pattern of the subsurface geologic characteristics wouldn’t just reflect the clustered sampling distribution? Therefore, I decided to subsample my data points, first to a smaller geographic area Mississippi Canyon Outer Continental Shelf (OCS) lease block (n=397; Figure 2A), and then further subsample those points (using the Create Random Points tool in ArcGIS) to select data points (n=50) to give them a clustered, random, and dispersed spatial distribution (determined using average nearest neighbor; Figure 2B). Then, I ran the spatial autocorrelation tool for each subsample (all Mississippi Canyon, and the clustered, random, and dispersed samples within Mississippi Canyon), which identified that despite the distribution of my sampling points, the values for temperature, pressure, and porosity are significantly clustered (Table 1). The next spatial statistic tool requested to test with our dataset was the hot spot analysis tool. I ran this tool on temperature value for the Mississippi Canyon (n=367) data subsample to identify if there are significant spatial clusters of high and low temperatures values. Results show (Figure 2C) that there are significant clusters of high temperatures (red dots and blue triangles) and low temperature (blue dots and blue triangles). Now, the next step is to being exploring the relationships between different subsurface geologic characteristics and different environmental conditions, such as water depth, subsurface depth, geologic age, etc. to identify any correlations that can used to help fill in spatial gaps of subsurface geologic characteristics throughout the Northern Gulf of Mexico.

Figure 2. Location of the subsampled data points in Mississippi Canyon (n=367; A), the subselected data samples with clustered, random, and dispersed distributions (n=50; B), and the results of the hot spot analysis tool for temperatures from the Mississippi Canyon subsample dataset (n=367;C)

One of my spatial problems is examining the spatial distribution of mitigated wetlands in the Willamette Valley to examine the quality of location chosen for restoration. The data set I used to test the hot spot tool is a point file of wetland mitigation sites (i.e. sites that have been restored or created based on intentional disturbance elsewhere).

The mitigation data look clustered when examined visually, and average nearest neighbor confirms this hypothesis.

It seems intuitive that wetlands would be clustered towards streams so I ran average nearest neighbor on the valley’s streams to examine spatial distribution. This showed that the streams are less clustered than mitigated wetlands, indicating there other factors that explain locations of mitigated wetland sites.

Categorical data is largely unusable in the spatial statistics toolbox. However, I wanted to examine the spatial distribution of mitigated wetlands compared to historic vegetation cover. In order to work around the categorical data, I first created a layer that only contained historic wetland vegetation; I then ran the “near” tool to calculate distance between the mitigated wetlands and the historic wetland polygons. Lastly, I ran the hot spot analysis on this distance.

Red indicates increased distance from a historic wetland. The results show that since most of the valley was once floodplain wetlands, most sites are situated on historic wetlands; an area near Portland, however, shows a hot spot of mitigated wetlands that are located further from historic wetland vegetation.

Continue reading →

GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Just another blogs.oregonstate.edu site

Category Archives: Uncategorized

Red-tailed hawk residency in Northwest Oregon

Investigating the extent of foraging range overlap between Adelie and Gentoo penguins at Palmer Station, Antarctica (week two update).

Generate Spatial Weights Matrix & Generate Network Spatial Weights

Contrasting Incremental Autocorrelation output with data points

Spatial Pattern of Forest Disturbance Magnitude

Getting Data/Research Ideas

Using Hot Spot Analysis, Anselin Moran’s, and Global Moran’s I to investigate clustering of clay content in northern Willamette Valley AVAs

Autocorrelation and movement patterns

Discerning a variable’s spatial pattern within a clustered dataset

Proxies for Using Categorical Data in Hot Spot Analysis to Examine Mitigation Patterns

Contact Info