GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Wetlands provide numerous ecosystem services; one of the most valuable services they can provide people is their natural ability to perform water filtration. I want to understand how connected wetlands are to water quality.

5/14/2014 UPDATE: After much deliberation, I’m taking a different approach to this project and class, and hopefully thesis!

New Research Questions
− Using Finley as a case study, can a wetland refuge improve water quality? To what extent?
a. how can it improve water quality – nutrients – evidence wetlands can act as a nitrogen sink – great in an agriculture heavy area like Corvallis
− How similar or different are invertebrate and plant communities (diversity and abundance as an indication of how “healthy” the ecosystems are) within Finley compared to surrounding streams and mitigated wetlands?
a. R: statistically significant correlations?
b. ArcGIS: spatial relationships (maybe cluster analysis)? (would need to account for spatial autocorrelation of habitat type, etc)
− Historically, how has land use affected the Finley refuge wetlands? How has Finley’s history impacted this study? How may land use affect the wetlands today?

Hypothesis
− ecosystem service of wetlands is improvement in water quality: I hope to show streams leaving wetlands will have a lower quantity of nitrogen than streams entering wetlands
a. may need a lit review: see what improvement is typical and to what extent to have a baseline
– can also compare to other surrounding stream quality data – maybe proximity analysis correlated with change in quality? I expect to see quality, again, improved leaving wetlands than those that don’t interact with the wetlands at all.

– I would expect similar-size nearby mitigated wetlands to show similar results; if different, maybe plant and invert communities can indicate differences in ecosystem health between the refuge and mitigated wetlands.

– I would expect past land use, as well as current surrounding land use, to impact the health of the wetlands and streams – run a land use correlation in comparison to plant communities and nitrogen content of streams?

Also, anyone know of a good land use Shapefile that incorporates the Finley area? Or any other Shapefiles that exist for Finley/surrounding areas? I’d hate to work off of aerial photography if I don’t have to (I’m not familiar with it, so if I do need to go that route any links to tutorials would be great!), any input is appreciated. This is a seed of an idea that can go a lot of ways.

Will update with a map next week!

————————————–

A National Wetlands Inventory shapefile of Oregon’s wetlands was clipped to the nine counties that make up the Willamette Valley (county data provided by BLM): Multnomah, Washington, Yamhill, Clackamas, Marion, Polk, Linn, Benton, and Lane Counties. Using wetlands mapped between 1994 and 1996 by The Nature Conservancy of Oregon (funded by the Willamette Basin Geographic Initiative Program and the Environmental Protection Agency (EPA)), data was created that inventoried, classified, and mapped native wetland and riparian plant communities and their threatened biota in the Willamette Valley. I have also clipped 2004 – 2006 stream and lake water quality data from the DEQ to these counties. Furthermore, I have mitigation bank data compiled by ODSL and ODOT developed by The Nature Conservancy of mitigated wetland locations in the Willamette Valley.

Wetlands Map

I am interested in looking at connectivity of streams to wetlands and the relationship of water quality to wetland location. For connectivity I may compare streams connected to wetlands versus those that are not connected to wetlands. Additionally, I am interested in seeing if the water quality data is different around mitigated wetlands versus natural wetlands. I also have stream and lake water quality data from other years, so measuring statistically significant change over time as well.

I am interested in receiving comments regarding potential statistical analyses to examine connectivity; to compare water quality around mitigated versus natural wetlands; and comparing water quality data over time.

So far I have identified over 70 local farms providing food to the Corvallis Farmer’s Market. While many of the farms are far-flung, there is a definite clustering affect around the city of Corvallis. This map shows the whole map, and keep in mind I am still collecting tiles because they are farm locations outside this area I cannot place yet. The purple ellipse comes from the Directional Distribution tool, and it shows the area containing 68% of the local farms, or containing 1 standard deviation. I traced the city limits for Corvallis, Albany and surrounding cities with a city limits shapefile in light blue. Farms that sell at the local Farmer’s Market are represented by gold stars. Note that the ellipse skews to the right of Corvallis, and is longer from the north to the south. Essentially, the ellipse is following the contour of the Willamette Valley, which we would expect.

The purple cross is the mean center of distribution of local farms, which is also the center of the ellipse. But the orange triangle is the median center of farm distribution. The median center moves quite a bit towards Corvallis, implying that remote geographic outliers influence the mean, and that perhaps farms cluster more strongly around the city of Corvallis than the distribution ellipse and mean center suggest.

There remain more than two dozen farms on the Corvallis Farmer’s Market list that I have yet to add to the dataset. After that, I would like to know the approximate acreage of each farm. This would allow me to do a hotspot analysis around a specific question. I have a theory that farms near Corvallis are likelier to be smaller, and that being near an urban development makes it more feasible to grow high quality produce on small acreage as a business model. To put it another way, Corvallis acts like a market driver that spurs and creates local sustainable development nearby. I could test for this using a hotspot analysis if I had an acreage estimation for each farm.

Identifying the location for each farm remains tedious, but each farm has a phone number associated with it, and many have an email address. An initial email survey followed by phone calls could yield information about the size of each farm, how long it has been selling locally, and how much of its produce goes to local markets.

There is definitely error in the accuracy of the farm locations. Google maps does a poor job of identifying the location of farm addresses, so I am sure some stars are nearby, but not on the right farm. It is also hard to determine boundaries of ownership by visual assessment, and so a polygon shape file estimating farm areas would be even less accurate. While I can use tax lot information for Benton County to determine farm area, Lynn County is much more difficult to access, and the farms are flung through many counties, so the process is time-consuming. A combination of ground-truthing and surveying would be necessary to improve accuracy to a publishable level. I have also not addressed farms selling to local groceries like the First Alternative, to local restaurants through wholesale distributors, or through CSAs, all significant contributors to the local food system.

LiDAR point information is usually available as a set of ASCII or LAS data files:

ArcGIS only supports LAS data files; to use ASCII LiDAR data with Arc, you’ll need to use an external tool to convert to LAS.

LAS files cannot be added directly; they must be combined into an LAS dataset that sets a consistent symbology and spatial reference for the entire collection. To create a LAS dataset, go to ArcCatalog and right-click the folder you want to store the dataset in, and select new>LAS Dataset:

Note that you will need 3D Analyst or Spatial Analyst activated to do this. I recommend checking all the extensions to be sure your tools run the first time.

Right-click the dataset in ArcCatalog and choose Properties.

Ensure your dataset has an appropriate coordinate system in the XY Coordinate System tab, which you’ll need to get from the metadata for the LiDAR. Next, activate the LAS Files tab and click Add Files or Add Folders. Once you are done adding files, activate the Statistics tab and press the Calculate button.

At this point, you can import your data into either ArcMap or ArcScene. There are pros and cons to both. As far as I’ve been able to determine, it is impossible to plot 3D point clouds in ArcMap with a DEM or map base. This is possible in ArcScene, and it is also possible to color points according to intensity in 3D view, but unlike in ArcMap there is no ability to adjust point size, and very limited ability to adjust colors of points, at least as far as I’ve been able to determine over the last few days.

Some LAS datasets will include RGB information in addition to intensity, which allows 3D true-color visualizations.

This image shows the Middle Lookout Creek site in the HJ Andrews Experimental Forest as a point cloud colored by intensity in ArcScene. The creek and some log jams are visible on the right, and a road is visible on the left.

To convert LiDAR points to a DEM, you’ll need to convert the dataset to a multipoint feature class first. Open ArcToolbox and go to 3D Analyst Tools> Conversion>From File>LAS to Multipoint. Select the specific LAS files you want. It’s likely that you’ll want to use the ground points only. The usual class code for ground points is 2, but you’ll want to check this by coloring the points by class.

Once you’ve created the multipoint feature, you need to interpolate the values between the points. There are several ways to do this, with advantages and disadvantages. Popular methods are Spline and Kriging. Spline generates a smooth surface but can create nonexistent features, and doesn’t handle large variations over shorter than average distances very well. Kriging is hard to get right without experience in what the parameters do, and can take a long time to achieve the best results, but attempts to take spatial autocorrelation into account. In general, Kriging is better for relatively flat areas, and Spline is better for sloped areas. Inverse Distance Weighting is popular, but produces results similar to a very poorly configured (for terrain anyway) Kriging interpolation. I find that a safe but time-consuming bet is Emprical Bayesian Kriging, which can be found in the toolbox under Geostatistical Analyst Tools> Interpolation, along with a few other advanced interpolation methods that I am not as experienced with. If anyone else is familiar with these, I’d welcome a post explaining how best to use them.

There are a ton of resources about using and understanding Spatial Statistics. The Esri Spatial Statistics team created this blog to make sure that everyone knows where to find them.

http://blogs.esri.com/esri/arcgis/2010/07/13/spatial-statistics-resources/

I am working on a habitat model to predict changes in available steam habitat for acid- and thermally-sensitive aquatic species. The main goal is to combine stream temperature model results with existing acidity (ANC) results for the southern Appalachian Mountain region to evaluate the spatial extent of a habitat “squeeze” on these species. The relationship between air and water temperature will also be explored to generate future scenario results of habitat availability with changes in air temperature. I have been mostly working with non-spatial regression modeling techniques and I would like to explore the use of spatial statistical models to account for spatial autocorrelation among observations.

Here is my full study region:

Non-spatial regression model results for ANC and water temperature:

Here is a potential study area with relatively high ANC data density:

For the purposes of this class I am going to attempt to construct habitat suitability models characterizing the pelagic habitat of an invertebrate species, California market squid (Doryteuthis opalescens) (Fig. 1 –thanks wikipedia), an important prey species for multiple predatory fish (i.e. spiny dogfish sharks and seabirds) and is also commonly captured in the survey region in high abundances.

800px-Opalescent_inshore_squid

The dataset I am working with consists of pelagic fish and invertebrate abundance data that have been collected by NOAA over a 14 year-long (1998-2011) period in the Northern California Current off the Oregon and Washington coasts. Pelagic fish and invertebrates were collected along up to at ~50 stations along eight transect lines off the Washington and Oregon coast in both June and September of each year (Fig. 2). Species were collected using a 30 m (wide) x 20 m (high) x 100 m (long) Nordic 264 pelagic rope trawl (NET Systems Inc.) with a cod-end liner of 0.8 cm stretch mesh. For each sample, the trawl was towed over the upper 20 m of the water column at a speed of ~6 km h^-1 for 30 min (Brodeur, Barceló et al. In Press MEPS).

In addition to species abundance data, survey personnel also collect in situ environmental data at each fish sampling station during each survey, including; water column depth, salinity, temperature and chlorophyll a data, as well as oxygen and turbidity data when instruments were available. One of my goals for this class is to supplement this in-situ environmental dataset with remotely sensed temperature and primary productivity as well as turbidity data from the MODIS-Aqua and SeaWiFS platforms in order to obtain a broader environmental context.

For my habitat suitability modeling approach I will utilize R to conduct Generalized Additive Mixed Effects Models (GAMMs) correlating the environmental covariates to both presence/absence data as well as abundance (catch per unit effort) data. Additionally, I will experiment with Maxent and other habitat suitability modeling techniques available to compare their output to my GAMM models.

Some of the spatial and temporal hurdles I face with this dataset include:

Unequal spacing between sampling locations: This may pose a challenge when attempting to spatially interpolate.

Scope of inference: The habitat modeling that I’ll attempt for this species is likely applicable only in the Northern California current or in a slightly extended region.

Scale of environmental data: The fact that I will be using environmental data from two different sources (in situ data (point data – localized measurement) vs. remotely sensed data (raster satellite data – 500m-1km grain)) will affect the resolution of my interpretations of habitat for this species.

Spatial autocorrelation among stations: Abundances and/or presence/absence of market squid may be spatially correlated among nearby stations due to autocorrelation in environmental covariates that define their habitat.

Temporal autocorrelation for each station: As the data I am using is a bi-annual survey, it is possible that the abundance and spatial structure of market squid within our sampling area is correlated between the two seasons of sampling. It is also possible that the temporal autocorrelation of an individual station with itself though time is not too big of a problem given the fluid medium in which sampling occurs and the highly variable inter-seasonal winds and currents in this region.

I have 218 benthic sediment grabs from the continental shelf ranging from 20 to 130 meters deep. These samples were taken from eight sites spread from Northern California to Southern Washington. Within each site, samples were randomized along depth gradients.

Each sample consists of species counts plus depth, latitude, and sediment characteristics such as grain size (i.e., sand versus silt), organic carbon, and nitrogen concentrations. Using Bayesian Belief Networks, species– habitat associations were calculated and established relationships were used to make regional predictive maps. Final map products depict the spatial distribution of suitable habitat where a high probability indicates a high likelihood of finding a species given a location and its combination of environmental factors. While sampling points were not taken across a consistent gridded scale, suitability maps were scaled to 250 meter resolution.

As in any habitat modeling process, the “best” model was chosen by looking at model performance and the amount of error, or misclassification between what was observed and what was predicted. Error of commission occurred when probability scores were high for a location where the species was actually observed to be absent. Errors of omission occurred when probability scores were low for a location where the species was actually observed to be present.

I am interested in two questions. The first is whether there is a spatial pattern to the error observed and the second question is at what scale is this error significant? Error may be caused by variation in the environment that occurs at a finer scale than what my modeling structure captures.

To explore these two questions, I intend to conduct a spatial autocorrelation analysis on the error for each local site to determine if there is any potential spatial pattern, and if so, if there is an associated environmental pattern to the error (i.e., does most of the errors occur in shallower or deeper water?). I am also interested in creating high resolution local maps of sediment characteristics (grain size, organic carbon, and nitrogen) through spatial interpolation technique using sediment grab data. For these local sites, I will then recreate predictive maps and compare to the 250 meter predictive maps.

My spatial problem deals mainly with determining what scale best fits both birth registry point data in Texas (4.7 million births from 1996-2002) and available spatial resolution of MODIS aerosol optical thickness data. A smaller cell size will more accurately determine ambient particulate matter exposure levels, but may leave too many cells with 0 births at different spatial scales. Increases in the cell size will allow a better coverage of the state, but may limit the spatial statistical relationships of low birth weight rates and accuracy of ambient air pollution exposures. A model will need to be created to combine ground-based air monitor exposure levels and satellite data to accurately determine rural particulate matter exposures.

A temporal problem deals with creating a model that will determine ambient air pollution exposure levels in each cell during different known susceptibility windows during all 4.7 million pregnancies. An analysis will need to be done to determine the variability of particulate matter levels on multiple time scales and incorporate the best fit with pregnancy susceptibility windows.

A combination of the spatial analysis and temporal analysis will incorporate both time lags and spatial clustering. This aspect of the project should be relatively straightforward. The goal of this section will aim to determine whether 1) LBW are clustered in space and time, or 2) whether individual emitters (using EPA TRI data set) are spatially and temporally correlated with LBW.

Below are some examples of different cell size and temporal scales.

1. 2008-2009 LBW hot spot analysis based on Texas census tracts

2. A hot spot analysis using .1x.1 degree (roughly 10km x 10km) grids of 2008-2009 of LBW rates

3. A hot spot analysis using 1×1 degree (roughly 100km x 100km) grids of 1996-2009 LBW rates.

Basic methods:

This experiment was designed to quantify the distribution and abundance of invasive lionfish and then determine whether the distribution and abundance of various native prey species is correlated with lionfish distribution and abundance. Eight reefs were selected and assigned low or high lionfish density. Discrete plots and/or transects were establish on the eight reefs and the local distribution of invasive lionfish and select prey species was monitored for ten weeks. Automated video cameras were also deployed on the reefs to capture the movement of fish across the reef. Behavioral observations were made on each of the high lionfish density reefs at dawn, midday, and dusk, to record lionfish movement and behavior on the reef. Habitat structure of the reef was measure along with rugosity in all the plots and transects.

Here is an example of one of the high lionfish reefs. The star represents the potential hotspot for lionfish presence. Right now the plots and transects are only representations of the actual spatial distribution.

These are examples of lionfish location on different surveys.

TO DO NEXT?:

For each parameter, list an explicit prediction from the hypothesis that lionfish live and forage mostly within hotspots on reefs. The way to envision a prediction is: “If the hypothesis is true, then…”. Example predictions:

(1) There should be a significant correlation between the rankings of plots within a reef based on distance from the hotspot (1-9) vs. based on mean distance to lionfish through time.

(2a) Lionfish should spend more time in plots close to the hotspot than in plots further away.

(2b) Lionfish paths of movement should be close to the hotspot.

Specific tasks for each example prediction:

(1) Run Kendall’s tau rank correlation analysis?? of plots within a reef based on distance to hotspot vs. mean distance to lionfish through time.

(2a) Calculate lionfish time per plot from time budgets.

(2b) Map lionfish paths of movement from time budgets (eventually analyze distance of path from hotspot).

I don’t think I need to/can use Arc to get this done. I have GPS points for the center of each reef, as well as measurements for the reefs (length, width, surface area, circumference etc).

I need to be able to calculate distances and creates paths of movement and then calculate distances traveled. It only needs to be related to within in reef scale. I am unfamiliar with any sort of programs that can do stuff like this, however I don’t think it would be to hard once I figure out which one to use.

I spent two full years of my life tromping through wilderness, sacrificing life and limb for the most complete data sets humanly possible. We covered miles of remote beaches on foot to install geodetic control monuments and take beach profiles, separated by as little as 10 meter spacing. We brethlessly operated our pricey new boat in less than one meter of water to collect just one more line of multibeam sonar bathymetric data or to get the right angle to see a dock at the end of an inlet with our mobile LiDAR. One of the most trying, and perhaps most dangerous tasks undertaken by our four-person team was the installation of large plywood targets before LiDAR scans. Boat based LiDAR is not yet a commonly employed data collection method, and our team has been executing foot-based GPS surveys for years. We were dead set on ground truthing our new “high-accuracy” toys before we decided to trust them entirely.

A co-worker created large, plywood targets of varying configurations: black and white crosses, X’s, circles, targets, and checker boards. We tested them all, and determined the checker board to show up best after processing the intensity of the returns from a dry dock scan. For the next 12 months, we hiked dozens of these 60 centimeter square plywood nightmares all over the Olympic Peninsula for every scan, placing them at the edge of 100 meter cliffs, then hiking to the bottom to be sure we had even spacing at all elevations. After placing each target (using levels and sledges), we took multiple GPS points of its center to compare with spatial data obtained LiDAR. We collected so much data, other research groups were worried about our sanity.

Then, we finally sat down to look for these targets in the miles and miles of bluff and beach topography collected. Perhaps you already know what’s coming? The targets were completely impossible to find; generously, we could see about one of every ten targets placed. Imagine our devastation (or that of the co-worker who had done most of the hiking and target building).

So the spatial question is rather basic: where are my targets?

I hope to answer the question with a few different LiDAR data sets currently at my disposal. The first is a full LiDAR scan of Wing Point on Bainbridge Island, WA. It’s one of the smaller scans, covering only a few miles of shoreline. Deeper water near the shoreline allowed the boat to come closer to shore, and the data density is expected to be high. We hope to find a few targets, and have GPS data corresponding to their locations. Currently, the file is about 5 times the size recommended by Arc for processing in ArcMap. On first attempts, it will not open in the program. While dividing the file would be easy with the proprietary software used with the LiDAR, I’d like to figure out how to do that with our tools. This will be one of the first mountains to climb.

The second data set is a more recent target test scan. Since my departure and determining the frustrating reality of the plywood targets, the group has found some retired Department of Transportation (DOT) signs. They have used gorilla tape and spray paint to create target patterns, similar to the test done with the original batch. I’ve been given one line of a scan of these new target hopefuls. My goal here is to ascertain the abilities of ArcMap for processing target data and aligning it with GPS points, without the added trials of trying to find the darn targets. Of course, I’m already hitting blocks with this process, as well. Primarily, finding the targets requires intensity analysis. Intensities should be included in the .LAS file I’m opening in ArcMap, but they are not currently revealing themselves. My expectation is that this is related to my inexperience with LiDAR in ArcMap, but that remains to be seen.

Writing this post, I’m realizing that my link to spatial statistics currently seems far in the future. Just viewing the data is going to be a challenge, since the whole process is so new to me. The processing will hopefully result in an error analysis of the resulting target positions, when compared to the confidence of ground collected points. Furthermore, the Wing Point data was taken for FEMA flood control maps, and that sort of hazard map could be constructed once rasters or DEMs are created.

A large part of me is horrified by how much I’ve taken on, deciding to figure out how to use ArcMap for LiDAR processing when my experience with the program is already rather primitive. However, I’m excited to be learning something helpful and somewhat innovative, not to mention helpful to the group for whom I spent so many hours placing targets.

Just another blogs.oregonstate.edu site

Spatial Problem – Water Quality and Wetland Connectivity

Exploring Local Farms

Working with LiDAR in ArcGIS

Esri Spatial Statistics Resources

Stream temperature modeling

Habitat Suitability Modeling – California Market Squid

Putting error on a map

Spatial/temporal statistical analysis of ambient particulate matter exposure and LBW in Texas

Movement of Lionfish on isolated reefs

Lidar and Targets

Contact Info