Tag Archives: moran’s i

Fire Refugia’s Effects on Clustering of Infected and Uninfected Western Hemlock Trees


For Exercise 1, I wanted to know about the spatial pattern of western hemlock trees infected with western hemlock dwarf mistletoe. I used a hotspot analysis to determine where clusters of infected and uninfected trees were in my 2.2 ha study area (Map 1). I discovered a hot spot and a cold spot, indicating two clusters, one of high values (infected) and one of low values (uninfected).

In my study site, 2 fires burned. Once in 1829, burning most of the stand, and then again in 1892, burning everywhere except the fire refugia (polygons filled in blue). This created a multi-storied forest with remnant trees located in the fire refugias. One component of the remnant forest are infected western hemlocks. These remnant hemlocks serve as the source of inoculum for the hemlocks regenerating after the 1892 fire.

For Exercise 2, my research question was: How does the spatial pattern of fire refugia affect the spatial pattern of western hemlock dwarf mistletoe?

I predicted that a cluster of infected western hemlocks are more likely to be next to a fire refugia than a cluster of uninfected trees. In order to assess this relationship, I used the geographically weighted regression tool in ArcMap.

Geographically Weighted Regression

Geographically weight regression (GWR) works by creating a local regression equation for each feature in a data set you want to analyze, using an explanatory variable(s) to predict values for the response variable, using the least squares method. The Ordinary Least Squares (OLS) tool differs from GWR because OLS creates a global regression model (one model for all features) whereas GWR creates local models (one model per feature) to account for the spatial relationship of the features to each other. Because the method of least squares is still used, assumptions should still be met for statistically rigorous testing. The output of the GWR tool is a feature class of the same type as the input, with a variety of attributes for each feature. These attributes summarize the ability of the local regression model to predict the actual observed value at that feature’s location. If you have an explanatory variable that explains a significant amount of the variation of the response variable, this is useful for seeing how its coefficient varies spatially.

Execution of GWR

To use this tool, I quantified the relationship between the trees and the fire refugia. I used the “Near” tool for this to calculate the nearest distance to a fire refugia polygon’s edge. This was my explanatory variable. My response variable was the z-score that was output for each tree from the Optimized Hot Spot Analysis. Then I ran the GWR tool. I then used the Moran’s I tool to check for spatial autocorrelation of the residuals. This is to check the clustering of residuals. Clustering indicates I may have left out a key explanatory variable. The figure below displays my process.

I tested the relationship between nearest distance to a fire refugia polygon’s edge and the z-score that was output for each tree from the Optimized Hot Spot Analysis using OLS, which is necessary to develop a well specified model. My R2 value for this global model was 0.005, which is incredibly small. Normally I would have stopped here and sought out other variables to explain this pattern, but for this exercise I continued the process. 


This GWR produced a high global R2 value of 0.98 (Adj R2 0.98) indicating that distance to refugia does a good job of explaining variance in the spatial pattern of infected and uninfected trees. However, examining the other metrics for the local model performance gives a different picture of model performance.

Map 2 displays results for the coefficients for the explanatory variable of distance to nearest refugia. As this variable changes, the z-score increases or decreases. These changes in z-scores indicate a clustering of high or low values. From examining the range of coefficient values, the range is quite small, -0.513 to 0.953. This means that across my study site, the coefficient only changes slightly from positive to negative. In the north western corner, we see a cluster of positive coefficient values. Here, as distance to refugia increases, the z-score of trees increases, predicting a clustering of infected trees. These values are associated with high local R2 values (Map 4). In other places of the stand we see slight clustering of negative coefficients, indicating distance to refugia decreases the z-score of trees, predicting a clustering of uninfected trees.

Map 3 displays the standardized residuals for each tree. Blue values indicate where the local model over-predicted what the actual observed value was, and red values are under-predictions. When residuals from the local regression models are distributed randomly (i.e. not clustered or dispersed) over the study area, then the geographically weighted regression model is fit well, or well specified. The residuals of the local regression models were significantly clustered. (Moran’s Index of 0.265, p-value of 0.000, z-score of 24.344). Because we can observe clustering in my study area of residuals, there is another phenomenon driving the changes in z-scores; in other words, driving the clustering of infected and uninfected trees.

From the previous two map evaluations I saw that the distance of a tree to fire refugia was not the only explanatory variable necessary to explain why infected and uninfected trees clustered. Map 4 displays the local R2 values for each feature. The areas in red are high local R2 values. We see the northwestern corner has a large number of large values which correspond to a cluster of small residuals and positive coefficients. Here, distance to fire refugia explains the clustering of infected trees well. The reverse is observed in several other places (clusters of blue) where distance to fire refugia does not explain why infected or uninfected trees cluster. In fact the majority of observations had a local R2 of 0.4 or less. From this evaluation, I believe this GWR model using distance to refugia does a good job of explaining the clustering of infected trees, but not much else.


GWR is useful for determining how the coefficient of an explanatory variable can change across an area. One feature in a specified area may have a slightly different coefficient from another feature, indicating these two features are experiencing different conditions in space. This allows the user to make decisions about where the explanatory has the most positive or negative impact. This result is not something you can derive from a simple OLS global model. This local regression process is something you could do manually but the tool in ArcMap makes this process easy. The output of GWR is also easy to interpret visually.

Some drawbacks are that you need to run the OLS model first for your data to determine which variables are significant in determining your response variable. If not, then a poorly specified model can lead to inappropriate conclusions about the explanatory variable (i.e. high R2 values). Also, the evaluation of how the features interact in space is not totally clear. The features are evaluated within a fixed distance or number of neighbors, but there is no description for how weights are applied to each neighboring feature. Lastly, for incidence data, this tool is much harder to use if you want to determine what is driving the spatial pattern of your incidence data. Some other continuous metric (in my case a z-score) must be used as the response variable, making results harder to interpret.

Model Results Follow-Up

After finding that distance to a refugia was not a significant driver for the majority of trees, I examined my data for other spatial relationships. After a hotspot analysis on solely the infected trees, I found that the dispersal of infected trees slightly lined up with the fire refugia drawn on the map (Map 5).

Among other measures, forest structure was used to determine where fire refugia were located. Old forest structure is typically more diverse vertically and less clustered spatially. Also infected western hemlocks are good indicators of fire refugia boundaries because as a fire sensitive tree species, they would not survive most fire damage and the presence of dwarf mistletoe indicates they have been present on the landscape for a while. From the map we can see that the dispersal of infected trees only lines up with the refugia in a few places. This mis-drawing of fire refguia bounds may be a potential explanation for under-performance of the GWR model.

Spatial Patterns of Salmonella Rates in Oregon

  1. Question Asked

At this stage I asked several questions regarding the spatial distribution of population characteristics in all counties in Oregon in 2014: What are the county level spatial patterns of reported age-adjusted Salmonella rates within Oregon in 2014? County level spatial patterns of proportions of females? Median Age? Proportion of infants/young children aged 0-4 years?

To answer these questions I used several different datasets. The first dataset used is a collection of all reported Salmonella cases in Oregon from 2008-2017 which includes information like sex, age group, county in which the case was reported, and onset of illness. The information in this dataset was deidentified by Oregon Health Authority. The second dataset used was a collection of Oregon population estimates over the same time period. This dataset includes sex and age group specific county level population information. I also obtained county level median ages from AmericanFactFinder. The last dataset used is a shapefile from the Oregon Spatial Data Library containing polygon information of all Oregon counties.

  1. Names of analytical tools/approaches used

I used a direct age adjustment (using the 2014 statewide population as the standard population) to obtain county level age-adjusted Salmonella rates. After calculating county level summary data e.g. proportion of females, proportion of children aged 0-4, median age, and age-adjusted Salmonella rates, I merged this information with a spatial dataframe containing polygonal data of every county in Oregon. After doing this I did both local (between 0-150 km) and global (statewide) spatial autocorrelation to get a Moran’s I statistic for each of the population variables listed above. I produced choropleth maps of each of the variables for Oregon as well. Finally, I produced a heatmap for county-level age-adjusted Salmonella rates using a Getis-Ord Gi* local statistic to evaluate statistically significant clustering of high/low rates of reported Salmonella cases.

  1. Description of the analytical process

After extensive reformatting, I was able to organize cases of Salmonella by age group and by county for the year 2014. After this I formatted 2014 county level population estimates in the same way. I then divided the Salmonella case dataframe by the population estimate dataframe to get rates by the different age groups. To get county age-adjusted rates I created a “standard population”, in this case I used Oregon’s statewide population broken down into the same age groups as above. I then multiplied the each of the county’s age-specific rates by the standard population’s matching age groups to create a dataframe of hypothetical cases. This dataframe represents the number of cases we would expect in each of the counties if they had the same population and age distribution as Oregon as a whole. I summed the expected Salmonella cases by county and divided this number by the 2014 statewide population. This yielded age-adjusted reported Salmonella rates by county.

Given that the population data contained county level populations broken down by age group and by sex I was able to calculate proportions of county populations which were female, and which were young children aged 0-4 years by dividing those respective group populations by the total county population.

After this I performed local and global spatial autocorrelation with Moran’s I using the county level median age, proportion of children, proportion of females, and age adjusted Salmonella rates which were associated with centroid points for each county. The global Moran’s I was calculated using the entire extent of the state and the local Moran’s I was calculated by limiting analysis to locations within 150 km of the centroid. Both global and local Moran’s I statistics were calculated using the Monte-Carlo method with 599 simulations.

Finally, I completed a Hot Spot Analysis using Getis-Ord Gi* to assess for any statistically significant hot or cold spots in Oregon. This was only done for the age-adjusted Salmonella rates. This was completed using the same county centroid points as above. I completed this analysis with a local weights matrix using Queen Adjacency for neighbor connectivity. The weighting scheme was set to where all neighbor weights when added together equaled 1.

  1. Brief description of results you obtained

Choropleth Maps of Oregon:

From the median age map, we can see that there are some clusters of older counties in the northeastern portion of the state and along west coast. Overall, the western portion of Oregon is younger than the eastern portion of the state.

From the proportion of children map there are a few clusters of counties in the northern portion of the state with high proportions of children compared to the rest of the state. Overall, the counties surrounding the Portland metro area have higher proportions of children compared to the rest of the state.

From the proportion of females map, we can see that the counties with the highest proportion of females are clustered in the western portion of the state.

Finally, from the age-adjusted county Salmonella rates map we can see that the highest rates of Salmonella occur mostly in the western portion of the state with a few counties in the northeast having high rates as well. Overall, the counties surrounding Multnomah county have the highest rates of Salmonella.

The global Moran’s I statistics:

  • County proportions of females: 0.053 with a p-value of 0.15. This suggests insignificant amounts of slight clustering.
  • County median age: 0.175 with a p-value of 0.02. This provides evidence of some significant mild clustering.
  • County proportions of children: 0.117 with a p-value of 0.05. This provides evidence of significant mild clustering
  • County age-adjusted Salmonella rates: -0.007 with a p-value of 0.32. This suggests insignificant amounts of higher dispersal than would be expected.

Local Moran’s I Statistics:

  • County proportions of females: 0.152 with a p-value of 0.02. This suggests significant amounts of mild clustering.
  • County median age: 0.110 with a p-value of 0.07. This provides evidence of some insignificant mild clustering.
  • County proportions of children: 0.052 with a p-value of 0.1617. This provides evidence of insignificant slight clustering
  • County age-adjusted Salmonella rates: -0.032 with a p-value of 0.5083. This suggests insignificant amounts of higher dispersal than would be expected.

Getis-Ord Gi*:

  • The heatmap shows a significant hotspot (with 95% confidence) in Clackamas county with another hotspot (with 90% confidence) in Hood River County. Three cold spots (with 90% confidence) are seen in Malheur, Crook, and Morrow counties.

  1. Critique of Methods

The choropleth maps were very useful at showing areas with high/values however this method was not able to detect counties with significantly different values compared their neighbors. Overall, it was useful as an exploratory tool. The global and local Moran’s I calculations were able to detect if high/low values were closely clustered or more dispersed than what is expected. However, I am unsure if this method was completely appropriate given the coarseness of this county level data. At a local scale, only the proportion of women showed a significant amount of clustering, and globally median age and proportion of children showed some amount of significant clustering. Given that most of the Moran’s I statistics were not associated with significant values, I don’t believe this analytical method highlighted a particularly meaningful spatial pattern in my data. The heatmap provided evidence of some significant hot and cold spots in Oregon, however this was based on immediate neighbor weights and perhaps global weights would be more appropriate. Overall, this tool was very useful in detecting significantly high/low Salmonella rates.

Ex 1: Mapping the stain: Using spatial autocorrelation to look at clustering of infection probabilities for black stain root disease

My questions:

I am using a simulation model to analyze spatial patterns of black stain root disease of Douglas-fir at the individual tree, stand, and landscape scales. For exercise 1, I focused on the spatial pattern of probability of infection, asking:

  • What is the spatial pattern of probability of infection for black stain root disease in the forest landscape?
  • How does this spatial pattern differ between landscapes where stands are clustered by management class and landscapes where management classes are randomly distributed?

    Fig 1. Left: Raster of the clustered landscape, where stands are spatially grouped by each of the three forest management classes. Each management class has a different tree density, making the different classes clearly visible as three wedges in the landscape. Right: Raster of the landscape where management classes are randomly assigned to stands with no predetermined spatial clustering. The color of each cell represents the value for infection probability of that cell. White cells in both landscapes are non-tree areas with NA values.

Tool or approach that you used: Spatial autocorrelation analysis, Moran’s I, correlogram (R)

My model calculates probability of infection for each tree based on a variety of tree characteristics, including proximity to infected trees, so I expected to see spatial autocorrelation (when a variable is related to itself in space) with the clustering of high and low values of probability of infection. Because some management practices (i.e., high planting density, clear-cut harvest, thinning, shorter rotation length) have been shown to promote the spread of infection, there is reason to hypothesize that more intensive management strategies – and their spatial patterns in the landscape – may affect the spread of black stain at multiple scales.

I am interested in hotspot analysis to later analyze how the spatial pattern of infection hotspots map against different forest management approaches and forest ownerships. However, as a first step, I needed to show that there is some clustering in infection probabilities (spatial autocorrelation) in my data. I used the “Moran” function in the “raster” package (Hijmans 2019) in R to calculate the global Moran’s I statistic. The Moran’s I statistic ranges from -1 (perfect dispersion, e.g., a checkerboard) to +1 (perfect clustering), with a value of 0 indicating perfect randomness.

Moran’s I = -1

Moran’s I = 0

Moran’s I = 1









I calculated this statistic at multiple lag distances, h, to generate a graph of the values of the Moran’s I statistic across various values of h. You can think of the lag distance of the size of the window of neighbors being considered for each cell in a raster grid. The graph produced by plotting the calculated value of Moran’s I across various lag values is called a “correlogram.”

What did I actually do? A brief description of steps I followed to complete the analysis

1. Imported my raster files, corrected the spatial scale, and re-projected the rasters to fall somewhere over western Oregon.

I am playing with hypothetical landscapes (with the characteristics of real-world landscapes), so the spatial scale (extent, resolution) is relevant but the geographic placement is somewhat arbitrary. I looked at two landscapes: one where management classes are clustered (“clustered” landscape), and one where management classes are randomly distributed (“random”). For each landscape, I used two rasters: probability of infection (continuous values from 0 to 1) and non-tree/tree (binary, 0s and 1s).

2. Masked non-tree cells

Since not all cells in my raster grid contain trees, I set all non-tree cells to NA for my analysis in order to avoid comparing the probability of infection between trees and non-trees. I used the tree rasters to create a mask.
c.tree[ c.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA
c.pi.tree <- mask(c.pi, c.tree) # Combine the prob inf with tree, leaving all others NA
# Repeat with randomly distributed management landscape
r.tree[ r.tree < 1 ] <- NA # Set all non-tree cells in the tree raster to NA
r.pi.tree <- mask(r.pi, r.tree) # Combine the prob inf with tree, leaving all others NA

Fig 2. Filled and hollow weights matrices.

3. Calculated Global Moran’s I for multiple values of lag distance.

For each lag distance, I created a weights matrix so the Moran function in the raster package would know how to weight each neighbor pixel at a given distance. Then, I let it run, calculating Moran’s I for each lag to create the data points for a correlogram.

I produced two correlograms: one where all cells within a given distance (lag) were given a weight of 1 and another “hollow” weights matrix when only cells at a given distance were given a weight of 1 (see example below).

4. Plotted the global Moran’s I for each landscape and compare.







What did I find? Brief description of results I obtained.

The correlograms show that similar values become less clustered at greater distances, approaching a random distribution by about 50 cell distances. In other words, cells are more similar to the cells around them than they are to more-distant cells. The many peaks and troughs in the correlogram are present because there are gaps between trees because of their regular spacing in plantation management.

In general, the highest values of Moran’s I were similar between the landscape with clustered management landscape and the landscape with randomly distributed management classes. However, the rate of decrease in the value of Moran’s I with increasing lag distance was higher for the random landscape than the clustered landscape. In other words, similar infection probabilities had larger clusters when forest management classes were clustered. For the clustered landscape, there was actually spatial autocorrelation at lag distances of 100 to 150, likely because of the clusters of higher infection probability in the “old growth” management cluster.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Filled” weights matrix.

Correlogram for the clustered and random landscape showing Moran’s I as a function of lag distance. “Hollow” weights matrix.














Critique of the method – what was useful, what was not?

My biggest issue initially was finding a package to perform a hotspot analysis on raster data in R. I found some packages with detailed tutorials (e.g., hotspotr), but some had not been updated recently enough to work in the latest version of R. I could have done this analysis in ArcMap, but I am trying to use open-source software and free applications and improve my programming abilities in R.

The Moran function I eventually used in the raster package worked quickly and effectively, but it does not provide statistics (e.g., p-values) to interpret the significance of the Moran’s I values produced. I also had to make the correlogram by hand with the raster package. Other packages do include additional statistics but are either more complex to use or designed for point data. There are also built-in correlogram functions in packages like spdep or ncf, but they were very slow, potentially taking hours on a 300 x 300 cell raster. That said, it may just be my inexperience that made a clear path difficult to find.


Glen, S. 2016. Moran’s I: Definition, Examples. https://www.statisticshowto.datasciencecentral.com/morans-i/.

Robert J. Hijmans (2019). raster: Geographic Data Analysis and Modeling. R package version 2.8-19. https://CRAN.R-project.org/package=raster


Exercise 1: Ventenata spatial clustering

Question Asked

I am interested in understanding the invasion potential of the recently introduced annual grass ventenata (Ventenata dubia) across eastern Oregon. Here I ask, what is the spatial pattern of the ventenata invasion across the Blue Mountains Ecoregion of eastern Oregon?

Tools and Approaches Used

To address this question, I (1) tested for spatial correlation at various distances using Moran’s I spatial autocorrelation coefficients plotted with a correlogram, and (2) performed hot-spot analysis (Getis-Ord Gi) to identify statistically significant clusters of areas with high and low ventenata cover.

Description of Analysis Steps

1a) Moran’s I: To compute Moran’s I spatial autocorrelation coefficient for all of my sample units, I used the “ape” package in R version 3.5.1. The first step to this analysis was to convert the ventenata data and associated coordinates into a distance matrix. Once the distance matrix was created, the Moran.I function computed the observed and expected spatial autocorrelation coefficients for the variable of interest (ventenata abundance). The function produces a test statistic that tests the null hypothesis of no correlation. See Gittleman and Kot (1990) for details on how the Moran.I function calculates Moran’s I statistics.

1b) Correlogram: I plotted a correlogram using Moran’s I coefficients with increasing distances (lags) to examine patterns of spatial autocorrelation in my data. I used the correlog function in the spdep package in R to plot a correlogram with lag intervals of 10,000m. The function has the option of randomly resampling the data at each increment to incorporate statistical significance. This randomization tests the null hypothesis of no autocorrelation. I ran the function with resamp = 100. Black points on the correlogram are indicative of Moran’s I values significantly larger or smaller than expected under the null hypothesis.

2) Hot Spot Analysis: I used the hot spot analysis (Getis-Ord Gi*) tool in Arc GIS to identify statistically significant clusters of areas with high and low ventenata cover across my study area. The tool produces z-scores and p-values that test the null hypothesis of a random distribution of high and low values rather than clusters of high or low values. High z-scores indicate clusters of high values and low z-scores indicate clusters of low values. Low p-values indicate that these clusters are more pronounced than would be expected by chance.


1a) Moran’s I: The Moran’s I spatial autocorrelation coefficient estimate for all of the points across the entire sample area was 0.3 ± 0.05 (p < 0.3). This value is not particularly informative, as it only indicates that the data is positive spatially autocorrelated, but does not provide information to describe the spatial pattern. I chose to follow the Moran’s I up with a correlogram to uncover the spatial pattern driving the autocorrelation.

1b) Correlogram: The Moran’s I spatial correlogram shows a general trend of decreasing autocorrelation from 0 to about 70,000m where sudden jumps in Moran’s I values occur to up to ~0.3. Following this jump, the correlation decreases to -0.5 to -0.2 between 120,000 and 152,000m, then increases to ~0.3 at 170,000m, decreases to almost -1.0 just after 200,000m, and finally increases to almost 1 at 220,000m. The general trend appears to be decreasing from 0.2 to -0.9 at 220,000m with some high peaks interspersed. These high and low peaks indicate distinct ventenata patches distributed throughout the study area, suggesting a clustered spatial pattern of the ventenata invasion. The extreme high and low values at distances over 200,000 are likely a result of the few sample units being compared at these distances, thus these are not so informative of the overall spatial pattern.

2) Hot Spot Analysis: Hot spot analysis in ArcGIS depicted clusters ranging from high ventenata cover (large red circles) to low ventenata cover (small blue circles) across my study area (Fig. 2) using the calculated z-scores and p-values for each sample unit. The resulting map shows distinct clusters of high, low, and moderate ventenata cover distributed across seven sampled burn perimeters (displayed in light orange). The highest cover clusters are all located within the Ochoco and Aldrich Mountains in the center of the study region. The fires on the perimeters of the region exhibited clusters of low to no ventenata cover.

Critique of Methods Used

When run on all of the data across the entire region, Moran’s I did not produce a useful statistic, indicating only if the data was spatial autocorrelation without indicating a spatial pattern. However, when visualized with a correlogram at varying distances, the correlation coefficients suddenly told a story of spatial clustering. The results from the hot spot analysis reinforce the findings from the correlogram by clearing depicting clusters on a map of the study area. The hot spot analysis further explores these results by mapping the clusters of high and low ventenata cover on top of each of my sample units, providing a useful visualization of exactly where the clusters of high and low cover fall across the region.


Gittleman, J. L. and Kot, M. (1990) Adaptation: statistics and a null model for estimating phylogenetic effects. Systematic Zoology39, 227–241.


Exercise 1: The Spatial Patterns of Natural Resource Governance Perceptions


What are the spatial patterns of natural resource governance perceptions in the Puget Sound?

Tools and Approaches

  1. Moran’s I (with correlograms) and Semivariograms in R studio
  2. Kriging and IDW in ArcGIS Pro
  3. Hotspot Analysis in ArcGIS Pro


Analysis Steps

  1. To compute Moran’s I, I used the “ape” library in R which has a function called Moran.I(). This function takes the variable in question (governance perceptions), and a distance matrix to compute the observed and expected values of Moran’s I, as well as the standard deviation and a p-value. For this analysis, I also subset my data to examine spatial autocorrelation by demographics including area (urban, suburban, rural), political ideology, life satisfaction, income, and cluster (created by running a cluster analysis on the seven variables which comprise the governance index).  I created correlograms for the variables that were significant (urban, conservative, and liberal) using the “ncf” library and the correlog() function. These figures give a better picture of spatial autocorrelation at various distances.  To create semivariograms, I used the “gstat” and “lattice” libraries which contain a function called variogram. This function takes the variable of interest along with latitude and longitude locations. The object created can then be plotted. For this analysis I used the same subsets as in the Moran’s I analysis.
  2. To preform interpolation on my data, I loaded my point data into ArcGIS Pro. I then used the Spatial Analysis toolbox to preform Kriging  and IDW to compare the outputs of the two techniques. I used my indexed variable of governance perceptions. The values of the variable vary from 1 to 7. I then also uploaded a shapefile bounding the sample area, as well as a shapefile of shoreline, to delineate my study area better.
  3. To run a hotspot analysis I used my previously loaded point data inArcGIS Pro. I then used the Spatial Analysis toolbox to preform ‘hotspot analysis.’ I used my indexed variable of governance perceptions with values from 1 to 7. I used the shapefile of shoreline to delineate my study area better.


  1. The Moran’s I calculation was insignificant for rural, suburban, cluster groups, life satisfaction, and income, suggesting no spatial autocorrelation of governance perceptions by these subsets.


The Moran’s I calculation was significant for urban:

Observed value: -0.014

P-value: 0.0002


The Moran’s I calculation was also significant for ideology:


Observed value: -0.006

P-value: 0.002


Observed value: -0.002

P-value: 0.05


This suggests that in these subsets there is spatial autocorrelation between individual governance perceptions.

The semivariograms for the subsets that are significantly spatially autocorrelated are presented below.


None of these plots suggest high degrees of spatial autocorrelation. The urban plot does so more than the ideology plots, but the y axis scale is still very small.









The plot (top Urban, bottom left Liberal, bottom right Conservative) help to confirm the findings from above. The Moran’s I fluctuates around zero without much variation. The large spike in variation that the graphs do show are only for non significant points. Significant points are filled in, where non-significant points are open circles.

2. Interpolation

The kriging (bottom left) with individual points and IDW (bottom right), do not look incredibly different in terms of general trends. The kirging with shoreline (top) gives possibly the most interesting visual of spatial patterns. In general, perceptions are better (more green) in the center, where there is greater shoreline. There are also two sections that appear much more negative. To examine these locations further, I preformed a hotspot analysis.

3.  Hotspot Analysis

This image confirms the two bright red spots from the interpolation to be “cold spots” or spots that the value of perception is significantly lower  than the average perception (neutral) at a 99% confidence. The orange dots are a 95% confidence. The green corridor appears to hold in the southern part of the Sound and is confirmed at a “hotspot” or a spot that the value of perception is significantly higher than the areas surrounding it at a 99% confidence level.

The three main areas of red or orange correspond to the cities of Shelton (bottom), Port Angeles (west), and Everett with a little of south Whidbey Island (east).

  1. Critique

I believe all methods are useful, but some are redundant. I think it would probably be sufficient to do only one of each type of method—spatial autocorrelation and interpolation—but it is interesting and more convincing to see the same type of analysis done in different ways. The p-values from the Moran’s I appear to agree with the shape of the curve’s in the semivariograms, where the smaller p-values have more defined shapes. The same goes for the interpolation methods, while they are interesting to see side-by-side, they essentially tell the same story. I think in this case, the hotspot analysis shows the most interesting interpretation of the data because it indicates areas of potential concern.