April 2016 - GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Though diving into R can be a bit intimidating, only a basic understanding of R is required to beginning analysis your spatial data. First you need to install the right packages, there are a lot of spatial analysis packages in R which can be found here: https://cran.r-project.org/web/views/Spatial.html . However, the ones I’ll be using are, “rgdal”, “rgeos”, “maptools”, “dplyr”, “tidyr”, “tmap”.
You can install these by (without the # sign): #install.packages(c("rgdal", "rgeos", "maptools", "dplyr", "tidyr", "tmap"))
In this example we’ll be playing with my data which you will not have access too so here is a tutorial that will definitely help you get started if you are really interested in it. https://cran.r-project.org/doc/contrib/intro-spatial-rl.pdf
First we need to make sure the right libraries(packages) are loaded. If you Google each one of these with followed by “in R” it will come up with the help documentation behind the package and what it does. However, I will mention that the package ‘sp’ is quite important, it is the bases behind all other spatial analysis packages here since it defines the spatial data class in R. It allows R to recognize and deal with spatial vector data and raster data.
x <- c("rgdal", "rgeos", "maptools", "dplyr", "tidyr", "tmap") lapply(x, library, character.only=T)
Now we have the packages lets go ahead and read in my data. The first file I have is just a simple .csv file on different soil properties. read.csv is one way you import data into R that is a .csv.
##reading in a .csv file with all the attributes ##reading in a .csv file with all the attributes NRCS_data <- read.csv ("Data/NRCS_Data/Agg_NRCS_Data.csv", header=T, stringsAsFactors = F)
Next we are going to use readOGR function in the package rgdal.
##This is a polygon shapefile CRA <- readOGR(dsn="Data/NRCS_Data", layer="Final_CRA") ## OGR data source with driver: ESRI Shapefile ## Source: "Data/NRCS_Data", layer: "Final_CRA" ## with 60 features ## It has 4 fields CRA ## class : SpatialPolygonsDataFrame ## features : 60 ## extent : -124.566, -116.463, 41.99208, 46.28576 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0 ## variables : 4 ## names : OBJECTID, Final_CRA, Shape_Leng, Shape_Area ## min values : 1, 1.1, 2.338222, 0.03185210 ## max values : 60, 9.9, 53.866012, 3.83380590
It is important to note that it takes 2 parameters first dsn= is just where the folder with the shapefile is saved. layer= is the name of the shapefiles that you want to read in. We are reading in the shapefile “Final_CRA”, which is a polygon shapefile and storing it in R under the name CRA. You can see it is a spatialpolygonDataframe. There are 2 real important aspects to this data format first CRA@polygons and CRA@data. @polygons refers the the information on the polygons and their coordinates. What @data refers to their attribute table.
head(CRA@data, n=3) ## OBJECTID Final_CRA Shape_Leng Shape_Area ## 0 1 1.1 8.832671 0.458048 ## 1 2 1.6 17.469240 1.406368 ## 2 3 10.1 24.579325 1.115243
you can see there isn’t any useful information associated with each polygon about soil parameters; however, the NRCS_data has interesting information that we want to link to these polygons.
##aggergating my atributes by pH finding the mean pH for each CRA agg_NRCS_data <- aggregate(ph_1_to_1_H20~CRA, FUN=mean, data=NRCS_data) agg_NRCS_data <- rename(agg_NRCS_data, Final_CRA = CRA) head(agg_NRCS_data, n=3) ## Final_CRA ph_1_to_1_H20 ## 1 1.1 5.150000 ## 2 1.6 5.153333 ## 3 10.1 7.153333
This data has pH values and their corresponding CRA.
Using a package called dplyr, I am able to attach the pH data to the CRA polygon data.
##merging the 2 datasets together by CRA CRA@data <- left_join(CRA@data, agg_NRCS_data) ## Joining by: "Final_CRA" ## Warning in left_join_impl(x, y, by$x, by$y): joining character vector and ## factor, coercing into character vector CRA@data <- rename(CRA@data, pH_water = ph_1_to_1_H20) head(CRA@data, n=3) ## OBJECTID Final_CRA Shape_Leng Shape_Area pH_water ## 1 1 1.1 8.832671 0.458048 5.150000 ## 2 2 1.6 17.469240 1.406368 5.153333 ## 3 3 10.1 24.579325 1.115243 7.153333
Simple, now the polygon dataframe has some interesting metadata associated with each polygon. However, suppose you want to just creat a point layer spatial dataframe instead of merging it to your polygon dataframe, which you can do in the sp package. The function SpatialPointDataframe is quite particular in its required parameters. It requires the coordinates to be in matrix format I am pulling the corresponding columns from the NRCS_data dataframe and converting them into a matrix.
##Creates a matrix of lat long coord NRCS_mat <- with(NRCS_data, cbind(longitude, latitude)) row.names(NRCS_mat)<- 1:nrow(NRCS_mat) ##Takes the projection from the CRA shapfile CRA.projection <- CRS(print(proj4string(CRA))) ## [1] "+proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0" ##Creates a spatail dataframe with the projection as our CRA shapefile NRCS <- SpatialPointsDataFrame(coords= NRCS_mat, data = NRCS_data, proj4string = CRA.projection)
These last 2 lines of code take the projection my CRA polygon data has and uses it form my newly formed point data dataframe.
Since both spatial dataframes are not projected I use a code from epsg (googled it) that corresponds with Oregon to project them I am creating new spatial dataframes with the projection with the function spTransform()
NRCS.projected <- spTransform(NRCS, CRS("+init=epsg:2992")) CRA.projected <- spTransform(CRA, CRS("+init=epsg:2992"))
Lastly, we want to see what these layers look like so I use the package tmap for plotting which works a lot like ggplot2. qtm is just like qplot in ggplot2. I tell it the layer I want to plot and I want to fill the polygons with the different pH values in each
qtm(shp=CRA, fill="pH_water")

tm_shape(CRA.projected)+ tm_fill("pH_water")+tm_borders()+tm_shape(NRCS.projected)+ tm_bubbles("red", size = .1, border.col="black")

The second line of code actually plots 2 layers the first layer is the polygon dataframe and the second one is the point dataframe. You do this by playing with the customization of tmap typing ?tmap will help show just how customization tmap package is.
Hopefully, this tutorial helped you get started with spatial data in R.

Question:

Where are the statistically significant regions of high and low Alaska Skate catch rate from commercial longliners in the Bering Sea?

Name of the tool or approach used.

ArcGIS- Hot Spot Analysis

Brief description of steps you followed to complete the analysis.

The dataset that I used to complete this analysis includes over 200 species of fish and four gear types, so I chose to limit the scope of this exercise to one species in one gear type. So I narrowed the dataset by selecting Alaska Skates caught by longliners, and ran a Hot Spot Analysis on this selection. Within the Hot Spot tool in Arc, I selected catch “RATE” as the input field for comparison, retained the default selections for the spatial relationship (“FIXED_DISTANCE_BAND) and distance method (“EUCLIDEAN_DISTANCE”), and ran the analysis.
Narrowing the temporal scope of the analysis, I ran it again on two subsets of the data. One on points from 1998-2007 and the second on points from 2008-2014.

Brief description of results obtained.

AK_skate-lgl_hotspot — Alaska Skate catch rate hot spot analysis 1998-2014

The original analysis includes AK skate catch rates on Bering Sea longliners from 1998-2014 and reveals some interesting patterns about fishing behavior as well as where skate catch rates are higher. There is a clear line of cold spots along the southern edge of the oceanic shelf that runs north of the Aleutian Islands. A few distinct hotspots are also revealed in the western Aleutians near Adak and Atka islands

AK_skate-lgl_hotspot98-07 — Hot spot analysis on Skate catch rate from 1998-2007.

AK_skate-lgl_hotspo08-14t — Hot spot analysis of Skate catch rate 2008-2014

Changing the temporal scope of the analysis revealed some very different patterns than the full time period. There are a number of hypotheses that can be developed for further exploration as to why the patterns change.

Critique of the method – what was useful, what was not?

For this dataset, I think that this tool is a quick and easy way to identify patterns at different scales and could be an approach to hypothesis generation. This dataset is one that the sample points are independent of the researcher and span a large temporal and spatial range. It seems that for this tool these characteristics in the dataset make it more applicable.

My dependent variable will be drop developed in Han and Goetz (2015). It is reciprocal for resistance, which is calculated as the amount of impulse that a county experiences from a shock (the percentage of deviation of the actual employment from the expected employment during the Great Recession).

I want to find what factors affect drop. Then I run an OLS regression using drop as the dependent variables and following independent varaibles:

Income inequality: the Gini coefficient

Income distribution: poverty rate and the share of aggregate income held by households earning $200,000 or more

Control variables: Population growth rate from 2001-2005, % Black or African & American (2000), % Hispanic or Latino (2000)

Capital Stock variables:

Human Capital: % population with Bachelor’s degree or higher, age group (20-29, 30-49), female civilian labor force participation rate (2000)

Natural Capital: natural amenity scale (1999)

Social Capital: social capital index (2005)

Built Capital(2000): median housing value (2000)

Financial Capital (2000): share of dividends, interest and rent(2000)

Economic structure:

Employment share of 20 two-digit NAICS industries (manufacturing, construction, etc. other services (except public administration) omitted)

Significant variables and diagnostic tests results are shown in the following table.

Ind. Var	Coefficient	Ind. Var	Coefficient
Population growth rate 2001-2005	0.205***	Transportation and warehousing	-0.337***
%Black or African American	0.0542*	Information	-0.590**
%Hispanic or Latino	-0.113***	Finance and insurance	-0.710***
% pop with Bachelor’s degree or higher	-0.132*	Educational services	-0.894***
Natural amenity scale	0.00728***	Health care and social assistance	-0.215***
Manufacturing	-0.112*	Accommodation and food services	-0.244**
Wholesale trade	-0.580***	Government and government enterprises	-0.231***
Retail trade	-0.620***	Adjusted R-squared: 0.199	AICc: -4693.152798
#obs:2770	Koenker BP Statistics ***	Joint Wald Statistic***	Jarque-Bera Statistic ***

Among the significant variables, I want to explore population growth rate from 2001 to 2005 because it has the larges coefficient in size among control variables.

Moreover, since the Konker BP statistics are significant, the results are non-stationary. Then I run the GWR for population growth rate 2001-2005.

GWR

The following maps are the distribution and GWR coefficients of population growth rate 01-05

For the distribution of population growth rate 01-05, blue regions show negative population growth while the rest counties are all growing in population. Orange and red counties are high in population growth rate.

For the GWR coefficients, negative relationship between population growth rate and drop exist in blue, grey and yellow regions – higher population growth, lower drop (employment loss)

Positive relationship exist in orange and red regions – higher population growth, higher drop (employment loss)

My explanation are :

In an expanding economy (regions high in population growth), there are more people marginally attached to the labor market. They are easily fired in the Great Recession.

In regions with negative population growth, there more deaths than births and population aging rises.

Older people have higher accumulated savings per head than younger people, but spend less on consumer goods.
Less available active labor
An increase in population growth will decrease employment loss.

However, my local R-squared, range from 0-0.620 with the mean 0.0247. Only in Orange County and San Deigo, CA, local R-squared is higher than 0.5, 0.603 and 0.620 separately. This situation (low local R-squared) happened to all my other significant control and capital stock variables () There might be misspecification problems in my model, I will try adding new explanatory variables.

Population growth rate

Local R-squared, 0-0.620, mean 0.0247
Only in 06059 Orange County and 06073 San Deigo, CA, local R-squared is higher than 0.5, 0.603 and 0.620 separately

Natural amenity scale

Local R-squared, 0-0.519, mean 0.0216
12087 Monroe County, Florida 0.519

% Black or African American

Local R-squared, 0-0.347, mean 0.0326
26095 Luce county, Michigan, 0.347

% Hispanic or Latino

Local R-squared, 0-0.379, mean 0.0276
35029, Luna County, New Mexico, 0.379

% pop with Bachelor’s degree or higher

Local R-squared, 0-0.391, mean 0.055
06073 San Deigo, CA, 0.391

Introduction:

Hotspot analysis is a method of statistical analysis for spatially distributed data whereby both the magnitude of a point and its relationship to the points around it are taken into consideration to identify statistically significant extreme values in a data set. While a single spatial data point may represent an extreme value, in the context of weeds management what often matters is being able to identify aggregates of values which taken as a whole represent an extreme condition. Where one large weed may have some impact, many medium and smaller weeds are likely to have a disproportionate impact. This has implications for weed management and weed science, specifically of the variety of techniques available to users for mapping the spatial distribution of invasive weeds, most rely on expert knowledge to decide what constitutes a “weedy” versus a “non-weedy” condition. This consideration is made independently of the geographic context of a value, usually by visually evaluation of the distribution of measured values, and deciding some cutoff, real or perceived, in the distribution of values. These cutoffs are often subjective, and are typically yield poor results when applied to independently generated data sets. Hotspot analysis offers us a road forward where statistically significant extreme regions can be identified, and various mapping techniques compared by how well they predict these extreme regions.

My goal for this project was to see if hotspot analysis provided a better technique for mapping weeds populations in wheat fields at the time of harvest. Previously, we visually evaluated the histograms, and using some knowledge of how the weeds were distributed to determine cutoffs for “weedy” and “non-weedy” values. Here I wanted to see if I could generate more accurate maps of weed density relative to a reference data set.

Sources of data:

Fig. 1: Sampling for weed density was done at the time of harvest using a spectrometer in the grain stream of a combine, and with a visual evaluator watching for green weeds during harvest.
In Summer 2015 we harvested a field winter wheat NE of Pendleton, OR using combine outfitted with a spectrometer in the grain. A ratio of red to near infrared reflectance was taken, along with visual observations made by an observer from inside the cab of the combine. Prior to harvest, we walked the field on gridded transects, recording observations of all weeds at the species level. These data were all re-gridded to a common 7m X 7m grid using bilinear interpolation, and brought into ArcGIS for analysis.

Classification:

In ArcGIS, hotspot analysis was conducted using Hotspot analysis tool. The Getis-Ord Gi* statistic identifies statistically significant hot and cold spots in whatever spatially referenced value you use as an feature class for classification. For the purposes of this exercise, positive Getis-Ord Gi* statistics were considered to be “weedy”, while not-significant and negative scores were considered to be “non-weedy”. Cutoffs based on a visual evaluation of histogram, and knowledge of what should represent a weedy condition used to distinguish between “weedy” and “non-weedy” values and their associated maps classified accordingly.

Fig 2: Ground reference data set classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.

Fig 3: Spectral assessment of weeds distribution classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.

Fig 4: Visual assessment of weeds distribution classified using Hotspot analysis and histogram classification. All green values were considered positive for weeds.

Comparing techniques
Classified maps were then compared for accuracy using a confusion matrix approach. Accuracy here is defined as the rate of true positives and true negatives, divided by the total number of samples. Classified maps were also compared for their precision and sensitivity, where precision is the rate of true positives of the condition positives, and precision is the rate of true positives over the predicted positives. Precision addresses the question of if I predict it positive, how often am I correct? Sensitivity addresses the question, when the reference data predicts it positive, how often was my prediction correct?

Fig 5: Error assessment comparing spectral assessment of weediness to ground reference data using the histogram classification.

Fig 6: Results of the error assessments for all mapping techniques compared with their ground reference data.

Results:

These results show that hotspot analysis was a superior method for image classification when compared with a visual evaluation of the distribution. This is probably a result of the fact that hotspot analysis takes into consideration the neighborhood of values a measurement resides in. Hotspot analysis increased the accuracy for both the spectral and the on-combine visual assessments. There were mixed results regarding its impacts on sensitivity and precision however. Hotspot classification increased the precision of the visual assessment, but did so at a cost to sensitivity. Conversely, the hotspot technique increased the sensitivity of the spectral assessment, but did so at a cost to its precision. It may be that precision comes at a cost to sensitivity using this method for comparing classifications. In general however, the most substantial gains were in accuracy, which is the most important factor for this mapping effort.

Question

How is the occurrence of landslides in western Oregon related to the rate and timing of rainfall? The Northwest River Forecast Center (NWRFC) archives 6-hour rainfall accumulation from more than 2000 recording stations throughout the Pacific Northwest (Figure 1). The challenge is that landslides do not tend to occur in the immediate proximity of these stations, and spatial interpolation must be performed to estimate rainfall amounts at landslide locations.

Figure 1: Location of rainfall recording stations provided by the NWRFC.

The Tool and its Implementation

Kriging was selected over inverse distance weighting to better account for variations in rainfall from east to west. Performance of kriging occurred in Matlab to allow for a better selection of inputs, and to simplify the task, which involved kriging every 6-hour measurement for December (124 times).

Kriging works by weighting measured values using a semivariogram model. Several semivariograms were examined to better identify which model would best fit the dataset (Figure 2).

Figure 2: Examples of semivariogram fits to NWRFC rainfall data.

Based on Figure 2, the exponential semivariogram appeared to be the best choice. Another option is the search radius (i.e. how far the model looks for data points). This value was also varied to illustrate the notion of a poor semivariogram fit (Figure 3).

Figure 3: Examples of varying the search radius (lag distance). Search radii from left to right: 1 degree, 5 degrees, 0.2 degrees.

Once each of the 124 surfaces were produced, values were extracted to the locations of major landslides from this past winter. The extracted values were later used to produce rainfall versus time plots, which are described in the next section.

Results

To simplify results for this project, only one landslide is shown. The Highway 42 landslide occurred on December 23, 2015, closing the highway for nearly one month and costing the Oregon Department of Transportation an estimated $5 million to repair. Rainfall versus time profiles were produced for three popular semivariograms (spherical, exponential, and Gaussian) to gauge the influence of choosing one method over another (Figure 4).

Figure 4: Comparison of results obtained from the different semivariograms and PRISM.

Figure 4 shows little effect due to changing the semivariogram model, which is likely a result of having limited variability in rainfall measurements and the distribution of recording stations near the location of the Highway 42 landslide.

To verify the results of this exercise, PRISM daily rainfall estimates were downloaded for the corresponding time period, and compared (Figure 4). This comparison shows that, while the PRISM data does not capture spikes in rainfall amount, the overall accumulation of rainfall appears to be similar, implying that kriging was effective for this application.

Question

The Statewide Landslide Information Database for Oregon (SLIDO, Figure 1) is a GIS compilation of point data representing past landslides. Each point is associated with a number of attributes, including repair cost, dimensions, and date of occurrence. For this exercise, I asked whether or not SLIDO could be subjected to a hot spot analysis, and if so, could the results be insightful.

Figure 1: SLIDO (version 3.2).

The Tool

Hot spot analysis is a tool in ArcGIS that spatially identifies areas of high and low concentrations of an inputted weighting parameter. The required input is either a vector layer with a numerical attribute that can serve as the weighting parameter, or a vector layer whose features indicate an equal weight of occurrence. Outputs are a z-score, p-value, and confidence level bin for each feature from the input layer.

Description of Exercise

Performing the hot spot analysis was more than simply running the tool with SLIDO as an input with the weighting field selected. Selecting an input field was easier said than done, as the SLIDO attribute table is only partially completed. Based on a survey of fields in the SLIDO attribute table, it was clear that repair cost was the best choice. All points having a repair cost were then exported to a new layer, which was then inputted into the hot spot analysis. An important note is that this step greatly reduced the number of points, and their scatter, and the output map looks slightly different than Figure 1.

Outputs

The output of this exercise is a comparison of SLIDO points colored by their repair cost with SLIDO points colored by confidence level bin (Figure 2).

Figure 2: Comparison of coloring by hot spots to simply coloring by cost.

Discussion

The second map in Figure 2 shows the presence of a major hot spot and a major cold spot regarding landslide costs in Oregon. The figure shows that, on average, landslides in the northwest corner of the state are more expensive. This observation can only be made because there appears to be a similar density of points, located at similar distances away from their neighbors, across the entire network of points. The figure also shows that single high-cost landslides do not play a major role in the positioning of hot spots, which is a nice advantage of the hot spot analysis.

In general, I think that the hot spot analysis did a good job illustrating a trend that may not have been obvious in the original data.

Bonus Investigation

In the hot spot documentation, it is stated that the analysis is not conducive to small datasets. An example of performing a hot spot analysis on a small dataset is provided in Figure 3. While there may be a trend in points colored by normalized infiltration rate, the hot spot map shows not significant points.

Figure 3: Hot spot analysis on a small dataset.

Goal: Where does statistically significant spatial clusters of high values (hot spots) and low values (cold spots) of drop lie?

Variable: drop (employment loss)

Monthly employment Jan 2003- Dec.2014
Data availabl:2840 counties

Tool : Hot spot analysis

Step:

Add Mean center of population.shp (3141 counties), U.S. county.shp (3141 counties), drop value (2840 counties)
Join the drop to the mean center, export as a new layer file.
Project the data
Run Hot Spot Analysis (Default for Conceptualization of Spatial Relationship, and Distance Method)
Make a Map

Distribution of drop, using natural Break

2 Hot spot Analysis using 3141 counties.

Low employment loss gathers in north part
High employment loss lies in west and southeast part

Questions:

Previous map is made of 3141 counties, however, there are only 2840 counties with drop value. Missing value will be noted as 0, will this affect the result?
Hawaii and Alaska are included. Will exclusion of HI and AK affect the result?

Both: Yes

Redraw the map
Only keep the matched records – 2838 counties
Exclude Hawaii and Alaska

3. Hot spot Analysis using 2838 counties

4. Hot spot Analysis using U.S. contiguous counties, Hawaii and Alaska removed.

Conclusion: Hot Spot analysis is sensitive to spatial outlier. It only identifies low value clusters or high value clusters. What if I want to find an outlier, for example low value unit among high value cluster?

I should use spatial autocorrelation tool- Anselin Local Moran’s I

Background:

In August 2015, the BLM released the National Seed Strategy which calls for the use of genetically appropriate seed for the restoration of damaged ecosystems. Bluebunch wheatgrass is a long-lived native perennial species that grows in most western states in the United States as well as British Columbia. This species has been shown to be genetically adapted to different climate patterns across its range. Adaptation is evident because bluebunch wheatgrass plants from distinct climates exhibit different phenotypic (physical) traits. Bluebunch wheatgrass seeds should therefore only be dispersed in areas where they are adapted to the local climate conditions.

In order to determine how far seed from this species can be moved across its range, species specific seed zones have been developed for the Great Basin ecoregion (St. Clair et al 2013). These seed zones were delineated using spatial analysis of climate patterns and observed plant traits from many populations of bluebunch wheatgrass in different climates in the Great Basin.

In order to test the efficacy of the seed zones for bluebunch wheatgrass common gardens have been established. Common gardens are a way of minimizing the effect of site history (i.e. grazing, fire, and management practices) on the growth habits of bluebunch wheatgrass and illuminate the climate-caused genetic adaptations. Wild seeds are collected from dispersed locations in the Great Basin, reared in a greenhouse to the young adult stage, and co-located into a common garden (see figure 1). Once the plants are placed within the common garden they can be monitored for phenotypic trait variability and these variations can be “mapped” back to their home climate.

Although the relationship between bluebunch wheatgrass phenotypes is fairly well understood, there is a knowledge gap surrounding bluebunch phenotypes and soils. I aim to use data gathered from existing common garden studies and soil maps to determine if relationships between soils and bluebunch phenotypes exist.

Study Design:

Two common gardens are situated within each of 8 seed zones resulting in 16 gardens total. At each common garden, 4-5 populations from each zone are represented. Twice per growing season, phenotypic trait data such as leaf length and width, crown width, and number of reproductive stalks are gathered for each individual plant at all common gardens. The resulting dataset contains population level means for each trait at each garden.

Generalized Common Garden Schematic

(figure 1)

Research Question:

Do phenotypic traits of bluebunch wheatgrass vary with soil order?

Tools and Approaches:

Soils Dataset:

Soils data were gathered from the Geospatial Data Gateway and were downloaded separately for each state in the study region to reduce the file size.
Soils raster layers for each state were loaded into ArcMap.
Tabular data for soil order was joined to raster data using the “Join Field” tool in ArcMap.
Soil order layers were symbolized using graduated colors by category.

Bluebunch Trait Dataset:

Latitude and longitude decimal degrees (X Y coordinates) were added to each population in the common garden plant trait dataset.
Blank cells were replaced with NA values.
X Y locations for each row in the dataset were “jittered” to remove duplicate coordinates (see figure 2 & 3).
- Using the rand() function in excel, two new columns with random positive decimal values were added.
- The difference between the two random value columns was subtracted from the X and Y coordinates for each row in the dataset.
- The resulting “jittered” X and Y coordinates were stored in new columns.
The bluebunch trait dataset was loaded in to ArcMap and visualized using the jittered X and Y coordinates.
Hot spot analysis was performed separately for each of four plant traits; leaf width, leaf length, crown width, and number reproductive stalks and visualized with soil order.

(figure 2) Un-jittered population X Y locations

(figure 3) Jittered population X Y locations

(figure 3) Jittered population locations

Results:

In the hot spot analysis of leaf width one hotspot and three cold spots were revealed. This indicates that a significant number of plants that were sourced from those areas had either wide or narrow leaves.

In the hot spot analysis of leaf length, one hotspot and two cold spots were revealed. This indicates that a significant number of plants that were sourced from those areas had either wide or narrow leaves. In this analysis, the hot spot was in a different location than previously indicating that wider leaves were not necessarily longer or vice versa. Contrastingly, the two cold spots in this analysis were in a similar location as with leaf width. This indicates that short narrow leaves tend to also co-occur.

In the analysis of crown width, the hotspot occurred in the same location as with the leaf width analysis. This leads me to believe that plants from this vicinity tend to be larger in general than plants from other areas.

In the analysis of reproductive stalks only one large hot spot was found. This indicates that the larger plants in this region also tend to produce significantly more flowering stalks per plant than in other areas.

Critique:

■ The jittering effect could be reduced to make it easier to distinguish the populations. It may also be possible that this type of analysis is not appropriate for this data set because the phenotypic trait data is for one population being grown at multiple common gardens and this creates a situation where the data values have the same X Y coordinate. Even though, the jittering effect was done at random, there is still a chance that the results we are seeing are a product of the data transformation itself.

■ 90-95% CI might be too stringent for reasonable genetic inference. In many ecological contexts p-values of less than 95% are a regular occurrence. In this case, leaf width / length, crown width, and the number of reproductive stalks of each plant probably co-vary. It does not appear that this type of analysis is meant to deal with co-variance.

■ Soil traits are de-emphasized. Although soil order was used as the background it was difficult to see a pattern between soils and plant trait hot spots. This may be partially due to the large scale of the analysis or may also be a product of the jittering.

■ Only one phenotypic trait can be visualized at once. Since only one trait can be mapped in the hot spot analysis at once, there is no way to know if the observed similarity in hot spot locations is truly significant.

■ Hot spot analysis does not work with raster data. In the future, I would like to find a way to look at the clustering patterns within soils and it appears that this analysis does not recognize raster data.

■ A less regional, and more zoomed in analysis might improve the interpretation. Since the soils are highly variable in this study area it may be more productive to narrow the scope of inference to include smaller areas. Doing this may make the soil – plant trait interpretation easier.

References:

St. Clair Bradley John, Francis F. Kilkenny, Richard C. Johnson, Nancy L. Shaw, and George Weaver. 2013. “Genetic Variation in Adaptive Traits and Seed Transfer Zones for Pseudoroegneria Spicata (bluebunch Wheatgrass) in the Northwestern United States.” Evolutionary Applications 6 (6): 933–48. doi:10.1111/eva.12077.

For the ArcGIS hot spot analysis I used the tutorial available at https://www.arcgis.com/home/item.html?id=6626d5cc81a745f1b737028f7a519521. Note that the tutorial must be downloaded using the “open” button below the image on the web page. This will download a folder with pdf file of instructions and a dataset for their analysis.

I followed the tutorial, but substituted a dataset of fire ignitions logged by the Oregon Department of Forestry between the years 1960 and 2009 (Fig. 1). This dataset does not include some fires that occurred in the state during that time, for instance some on federal land and many on non-forested land.

Figure 1: Oregon Department of Forestry logged fire occurrences from 1963 to 2009.

The tutorial was straightforward and easy to follow. The only “problem” I had was with the spatial autocorrelation step. My data did not produce any peaks in the z-score graph, so I arbitrarily chose a distance towards the lower end of the graph (tutorial steps 10 f and g).

My results from the hot spot analysis step (Fig. 2) showed areas with hot spots. These tended to be near roads and cities in most cases, with a few instances at high elevation in northeastern Oregon.

Figure 2: Results from hot spot analysis.

Results from interpolating the hot spot results using the IDW tool (tutorial step 12) (Fig. 3) produced anomalous edge effects (Fig. 3). Note the areas of high values spreading from hot spot areas into areas with no data along the coast. A close up image of the Medford area (Fig. 4) shows edge effects spreading into areas with no data.

Figure 3: Hot spot analysis interpolated surface produced by the application of the Arc IDW tool. Note the edge effects, especially along the coast and northern state boarder.

Figure 4: Interpolated hot spot surface produced by the Arc IDW tool along with the original ODF ignitions dataset points. Note the edge effects in the southwest corner of the image spreading from the area of high fire occurrence into areas with no data.

As part of a study I did for a master’s thesis, I used the ODF data along with another dataset to do a logistic regression analysis of ignition occurrences in the Willamette watershed excluding the valley floor. Within this area, the hot spot analysis placed hot spots along some roads and population centers in several instances (Fig. 5A). These results are consistent with the results from my study, which showed that human-caused fires were more likely to occur along roads and near areas of high population, and that lightning-caused fires were likely to occur at high elevations (Fig. 5B). The lack of a match between the two methods at high elevation is due to data points that were used in the logistic regression but not in the hot spot analysis. However, one disadvantage of the hot spot analysis is that it is univariate, or strictly occurrence based. The logistic method allows for correlation and the application of that correlation to the prediction of future occurrence

Figure 5: Results from A) hot spot analysis and B) logistic regression analysis for fire occurrence in the hills and mountains of the Willamette watershed. Redder areas are “hotter” in the hot spot analysis and have a higher ignition probability in the logistic regression analysis.

In conclusion, hot spot analysis is a good technique for finding areas of higher occurrence, but has no predictive power beyond analyzing past occurrences. The ArcGIS IDW can produce an interpolated surface, but it has an edge effect issue that could lead to questionable results. Regression methods, for example logistic regression, may produce results more suited to analyzing correlation of independent variables with event occurrence.

1. Background and Research Question:

The goal of my work in this course is to assess the influence of forest governance on spatial patterns of forest disturbance. Forest governance can be understood as forest-related decisions, their implementation and resulting effects within a given institutional setting, whereas forest disturbances are events that cause change in the structure and composition of a forest ecosystem. For this course, I’ll be focusing my analysis on disturbances associated with timber production, e.g., clear-cutting and partial harvests. My study area is Willamette National Forest, which encompasses roughly 6,800 square km in the central portion of Oregon’s West Cascades.

Governance of the Willamette and other federally managed forests of the Pacific Northwest is shaped largely by the Northwest Forest Plan (NWFP). A key aspect of the NWFP is a system of land use designation (LUD) in which spatially explicit zones are managed according to a single or dominant management priority (Charnley, 2006). For example, wilderness areas are designated as “Congressionally Withdrawn” and are thus protected from timber harvest, while “Matrix Lands” are those on which timber harvest is concentrated. The various LUDs, along with interspersed state, private and other lands, form a mosaic of ownership and management priorities that are manifested as disturbance patterns in the forest landscape. And so, the research question I’ll be addressing is: How do land use designations and ownership influence patterns of disturbance in Willamette National Forest?

2. Datasets:

I’ll be relying primarily on a Landsat imagery time-series (30 meter spatial resolution, 1985-2012) which has been processed using a change-detection algorithm called LandTrendr (Kennedy et al., 2010). Outputs from LandTrendr include disturbance patches categorized by agent of change (e.g., clear-cut, partial harvest, etc.) as well as their timing, duration and magnitude. This data will be clipped to the boundary of Willamette National Forest. To structure my analysis, I’ll be using vector data representing the administrative forest boundaries, and the LUDs and ownerships within them.

3. Hypotheses:

My general hypothesis is that patterns of disturbance will vary according to land use designation across space and time. For example, on Matrix Lands, I expect to see spatial clustering of clear-cuts that will increase during the period of the NWFP’s implementation (from 1994 onward), while on Adaptive Management Areas, I expect to see significantly fewer, more dispersed clearcuts, but increases in partial harvests.

4. Approach:

I will analyze spatial patterns of clear-cut and partial harvest disturbance patches within each LUD (e.g., clustering), as well as the spatial characteristics of these disturbance patches (e.g., edge-to-area ratio). My analysis will be done primarily in ArcGIS, but may also include Python scripting, statistical analysis in R, or “patch analysis” using FRAGSTATS.

5. Expected outcome:

I will produce maps and graphs of disturbance patterns by LUD over time. Hopefully this will help visualize whether or not the NWFP has been implemented as intended.

6. Significance:

In the Pacific Northwest, forest governance is shaped largely by the federally mandated Northwest Forest Plan (NWFP), which initiated a momentous shift in forest management priorities; from the provision of sustained timber harvest to protection of ecosystems (Thomas, 2006). The NWFP is implemented through an adaptive management strategy that must identify high priority inventory and monitoring objectives needed to assess the plan’s effectiveness over time (FEMAT, 1993). Ideally, this project and my overall research will contribute to this ongoing assessment.

7. Experience levels with…

ArcGIS = high

Modelbuilder, Python = medium

R = low

References:

Charnley, S., & Pacific Northwest Research Station. (2006). Northwest Forest Plan, the first 10 years (1994-2003) : Socioeconomic monitoring results (General technical report PNW ; 649). Portland, OR: U.S. Dept. of Agriculture, Forest Service, Pacific Northwest Research Station.

Kennedy, R. E., Z. Yang, and W. B. Cohen. 2010. Detecting trends in forest disturbance and recovery using yearly Landsat time series: 1. LandTrendr – Temporal segmentation algorithms. Remote Sensing of Environment 114:2897-2910.

Thomas, J., Franklin, J., Gordon, J., & Johnson, K. (2006). The Northwest Forest Plan: Origins, Components, Implementation Experience, and Suggestions for Change. Conservation Biology, 20(2), 277-287.

Report of the Forest Ecosystem Management Assessment Team (FEMAT, 1993)

GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Just another blogs.oregonstate.edu site

Monthly Archives: April 2016

Spatial Data Management and Simple Plotting with R

Hot Spot Analysis in Arc: Alaska Skate By-Catch Rate in Fisheries Observer Data

Exercise 4 OLS regression and GWR regression

Comparing hotspot analysis to a visual evaluation of histograms for methods of mapping weeds at the time of harvest in dryland winter wheat

Kriging to Interpolate Rainfall Data Between Measuring Stations

Landslide Repair Costs in Oregon

Exercise 2 Hot Spot Analysis for drop (employment loss during the recession)

Hot Spot Analysis to Visualize Bluebunch Wheatgrass Plant Traits According to Soil Order