Eating locally grown food can reduce the carbon-footprint of that food by reducing fuel-miles.  Spending locally on food also has important supporting effects on the local economy.  Currently the dominance of trucking and shipping freight industries allow large agricultural areas to centralize the production of food and deliver this food to a wide network of markets.  Local food networks have declined precipitously, to the point where a supermarket is more likely to carry tropical fruit out of season than local produce in season.

Local food networks have shifted in shape and character as they persist in an increasingly global marketplace.  Farmers Markets allow small local farms to sell their produce retail and direct-to-consumer.  Grocer’s co-ops and high-end grocery stores continue to stock their produce.  Organic wholesale distributors sell their produce to restaurants.  CSAs also meet growing demand by allowing consumers to buy shares of a farm’s produce, to be delivered direct-to-consumer on a weekly basis.

There is a lack of scholarly data and research on the character and function of local food distribution networks.  The most challenging part of the analysis is gathering the data.  This project examines the spatial distribution of farms serving the Corvallis Farmer’s Market.  For each vendor in the Corvallis Farmer’s Market list of vendors, I searched the farm name and address and used Google Maps to estimate the location of each farm.  Google Maps does not identify rural addresses precisely enough to associate with specific plots of farmland, so estimates of farm size come from survey data.

Survey Depth vs. Breadth

I want covariate information on these farms that a web search will not provide for all farms, so I knew early into the analysis that I wanted to do a survey.  To get a decent response rate, I wanted a month to gather responses, so I could push out a few reminder waves.  To meet this deadline, the list of farms to survey would have to be complete within a month.  The result is a dataset that is small in size, but contains sufficient covariate data to fit a generalized linear model in R.

There are several good options to improve the breadth of the survey: examining vendor lists for Salem Saturday Market and Portland PSU Farmer’s Market, contacting local groceries for a list of their local vendors, and contacting Organic wholesale providers for a list a local vendors.  Among the list of farms in the dataset, many more have phone numbers than emails, so a phone survey could potentially improve the number of responses from the current list.

Farms Surveyed                                                     vs.                                                             Responding Farms

geo599_farmssurveyedgeo599_nb_fit5

Key Variables

For each farm, I gathered data on the following variables:

  • Miles – Distance from Corvallis Farmer’s Market in miles.
  • Acres – Size of Farm in Acres
  • Age – Years Under Current Management
  • Age2 – Squared value of Age term
  • Local – Proportion of Sales to Local Markets

Miles – Distance from Corvallis Farmer’s Market

The location of each farm is an estimate based on the address placement market in Google Maps.  In rural neighborhoods, the marker frequently falls in the middle of the road and does not clearly correspond to a specific plot area.  Some farms also use as a business address an urban location that clearly does not produce the food on location.

I estimated distance from the farms to the Corvallis Farmer’s Market in ArcGIS using the Analysis > Proximity > Point Distance tool in the Toolbox.

In retrospect I am not convinced that distance “as the bird flies” is a good representation of spatial auto-correlation between producers and markets.  Fish do no swim as the bird flies from spawning grounds to the oceans, but rather travel along stream networks.  Likewise, traffic follows street and highway networks.  Estimating distance from farm to market by measuring likely traffic routes may be a better way to measure auto-correlation across spatial distances.

Surveying Farms

I used an email survey to gather covariate information on farms, including farm age, size in acres, and percent of local sales.  Out of 47 active emails on the list of Corvallis Farmer’s Market vendors, 30 responded to the survey, a 64% response rate.  An additional 63 farms have telephone contact information, but not email contact info, so the response number is only 27% of total vendors, and a round of phone surveys would help the sample to better represent the target population.

The Corvallis Farmer’s Market listed 128 vendors.  Salem Saturday Market lists 284 vendors, and Portland PSU Farmer’s Market lists 163 vendors.  In addition, the Natural Choice Directory of Willamette Valley lists 12 wholesale distributors likely to carry produce from local farms.  At similar response rates, surveying these farms would result in a sizeable dataset for analysis, and would be a fruitful subject for future study.  Surveying the addition Farmer’s Market would yield 120 responses at the same response rate, not counting the wholesale distributors, but there would likely be repeat farms in each vendor list.

The variable for proportion of food going to local markets was problematic.  Defining what markets qualify as local is difficult, because small differences in distance can be significant to smaller farms, while a larger farm may consider all in-state sales to be local.  There is no single appropriate threshold to describe local sales common to the variety of farms in the survey, so responses to the survey employed different definitions of local and are not comparable to each other.  Therefore I removed this variable from the model formula and do not use it in future analysis.

Analyzing Spatial Data in R

There are multiple reasons to prefer analyzing spatial data in R.  Perhaps the researcher is more familiar with R, like myself.  Perhaps the researcher needs to employ a generalized linear model (GLM, GAM or GAMM), or a mixed effects model, for data that is not normally distributed.  While ArcGIS has several useful tools for spatial statistics, its statistical tools cannot match the variety and flexibility of R and its growing collection of user-developed packages.  My particular interest in handing GIS data in R is to use the social network statistical packages in R to analyze food networks.  This dataset on farms does not include network data, but this project brings me one step closer to this kind of analysis, and future surveys could expand the current farm dataset to include network elements important to my research.

The following diagram shows a generalized workflow for transferring spatial data between ArcGIS and R:

geo599_workflow

Whereas several R packages allow the user to manipulate spatial data, the most popular package is called ‘maptools’.  I recommend using this package to open the spatial data file in R, which will import the file in a format similar to a dataframe.  The spatial data object differs from a dataframe in several important ways, whereas most statistical operations are designed to work on dataframes in particular.  Therefore, I recommend converting the imported object to a dataframe using the R function as.data.frame().   Perform all statistical analyses on the new dataframe, then save significant results to the initial spatial data object, and export the appended spatial data object to a new shapefile using the maptools command writeSpatialShape().

For farms serving the Corvallis Farmer’s Market, I suspected there was a relationship between key variables in the survey and distance from the farmer’s market, but I did have a specific causal theory, so I relied on basic data analysis to inform my model.  The relationship between the key variables was uncertain, as shown in this diagram:

geo599_modelvars

First I fit three different models, each using a different variable as the dependent variable in the model.  Then I examined the fit of each model to determine if one model best explained the data.  The following three charts show the tests for normality of distribution for residual error for each model.  With Acres and Age as the dependent variable, the model fit showed heavy tails on the Q-Q plots, whereas Miles as the dependent variable had relatively normal distribution of errors.

geo599_qqplots

The tests of residuals vs. fitted values showed a possibly curvilinear relationship with Acres as the dependent variable, with residual error seeming to increase as acre size increases, then to decrease as acre size increases further.  This curvilinear relationship was even more explicit with Age as the dependent variable, meaning this model clearly violates the assumption of errors being independent of the values of the dependent variable.  With miles as the dependent variable, the residual errors appear to satisfy the assumption of independence, as they appear independently distributed around zero regardless of the how far the farm was from the market.

geo599_rvfplots

In retrospect, these results make sense.  This analysis assumes there is a spatial auto-correlation between farms, that farms nearer the Corvallis Farmer’s Market are going to be more similar to each other than to farms further from the market.  Conceptually, the best way to fit this auto-correlation to a model is to use Miles from the Farmer’s Market as the outcome variable.  This is sufficient a priori reason to select this model, but the model fitting Miles as the dependent variable also best meets the model assumptions of independence and normality of errors.  Since I dropped Local from the variable list, then in the selected model Miles is dependent on farm size in acres, age in years under current management, and the squared age term.

One can save the residuals and fitted values for each observation directly to a new shapefile, but the coefficient results for variable terms require a new file.  To save a new table in a format that can import cleanly into ArcGIS, use the ‘foreign’ package in R and call the write.dbf() function to save a dataframe into DBF format, a table format that will open cleanly in ArcGIS.  The following table shows the coefficient results for the farms in the dataset, using a negative binomial regression:

geo599_coefs

After performing the analysis in R, I used the maptools function write.Spatial.Shape() to create a new shapefile containing the results of the analysis in its data fields.  This shapefile does not contain any significant results or insights that I am aware of.  The color of the diamonds signify which farms are closer or farther away from the market based on the covariate data, but because the sample size is so small, the only outliers in the set are either very close or very far from the market relative to the other samples.  This is an indication of data deficiency.  Here is the final map of results:

geo599_nb_fit5

However, in spite of failure to obtain initial significant results, I believe this research has real potential to shed light on the nature of local food networks in the Willamette Valley.  The farm data gathered for this research is original data, not extant data from a government survey.  This branch of inquiry can shed light on a part of the food system that is poorly studied and data deficient.  Now that I have a data management framework for moving data from ArcGIS to R for analysis and back, I can tackle more complicated analyses, and spending additional time collecting farm observations for the dataset becomes a worthwhile endeavor.

If I sample data from additional Farmer’s Markets in Salem and Portland, I can determine if the spatial distribution of farms is significantly different between cities, or if the farms serving each city share common attributes.  More importantly, sampling additional Farmer’s Markets, and other distribution networks like grocers and wholesale distributors means that some farms in the dataset will serve multiple markets, multiple Farmer’s Markets or possibly also grocers and restaurants.  The number of different markets a farm serves could be a significant variable in regression analysis.  Connectivity to additional local food markets is network data, so I could use network analysis to determine if connectivity was a significant factor in relation to farm size or age.

There is a natural tension as I gather data between questions that are useful for analysis and questions that are realistic to answer in the scope of a survey.  I achieved a 64% response rate by limiting the number of questions I asked to only three.  The fewer the questions, the less invasive the questions, the higher the response rate.  If I asked for nearly as much info as I desired, my response rate would have dropped to less than 1%.  Every variable I add to the survey reduces the odds of response.  So developing this dataset requires careful consideration of each additional variable, determining if the benefit to analysis justifies the imposition upon the survey participant.  While I feel that there remains room for additional variables in the analysis, I am still looking for candidate variables that justify the increased transactions costs of their inclusion.

This project has been an enjoyable analysis of a topic in which I am very interested.  I hope I can get the chance to continue pursuing this research.

 

 

 

 

 

 

 

Recap:

•Using Bayesian Nets to model species habitat suitability of benthic infauna
•Creating predictive maps of the likelihood of suitable habitat
•Explanatory variables: Depth, Grain Size, Organic Carbon, Nitrogen, Percent Silt/Sand, Latitude
Bayes Net Model
sfossor_netsfossor_s2n4legend

I was interested in two questions:
1. Is there spatial pattern to the observed error?
2. Does the scale of prediction increase or decrease error?

I started noticing some of my points with error contained samples of gravel. The focus of my study was on soft bottom sediment and the regional maps I am using for prediction have areas of rock and gravel masked out. There were only eight records in 218 samples that contained gravel. Finally, as gravel skews grain size measurements to the right, I decided these eight samples were outliers and removed them from the analysis.

The species I decided to focus on is a marine worm that had a strong response to higher values of grain size and was found at Depths greater than 55 meters. Due to its strong preference to grain size and depth, I started created plots to determine if error was occurring within a certain range of these two variables.

DepthvsGS_error

 

Most of the error was occurring at depth around a grain size of 4. Coloring the plots by sampling site provided some useful feedback.

DepthvsGS_site

 

This graph depicts records of species absence on the left and species presence on the right. While the species preferred deep and high grain size, areas within this range where the species was absent was predominantly from one site, NSAF. This site also happened to be the most southern site in the region. Also, the shallowest region where the species was found to be present was in the next site just north of NSAF, off the coast of Eureka, CA. So, I wanted to look at the data again, isolating these two most southern sites from the rest.

DepthvsGS_latitude

 

In this graph, the two most southern sites are on the bottom graph while the remaining sites are on the top graph. The species response to grain size appears shifted to higher values and there appears to be more records of absence at depth. I different response to physical features in the southern range could be due to a number of factors, such as a more narrow shelf (less opportunity for larval dispersal) and a different characteristic to the silt and sand particles as the rivers along the southern region drain the Southern Cascade Mountains as opposed to the Coast Range further north.

Pooling all this information together, I made several changes to the model. I simplified the model by removing the explanatory variables: Organic Carbon, Nitrogen, and Percent Silt/Sand. I added a latitude variable and categorized it into north and south. Due to the simplification of the model, I was able to add more categories to both the depth and grain size variable to capture more of the relationships between them and the species response.

The final model.

e10_model

 

Model performance improved.

Original Model: 11.0% error

s2n4_Norths2n4_Middlen2s2_South

Revised Model: 6.9% error

e10_northe10_Middlen2s2_South

 

Original Model on Left, Revised Model on Right:

sfossor_s2n4 e10

I also calculated a difference between the two maps. On the left is a map depicting the change in habitat suitability. On the right is a map of regions where there was a shift in prediction (i.e. absent to present).

sf_diff_resize  ChangePred_resize

Next step: Does the scale of prediction increase or decrease error?

•Current predictive maps limited by resolution of regional dataset (250 meter)
•Local sampling sites:
–Surveyed with high resolution multi-beam
–Higher density of Grain Size samples than region
•Goal:
–Create a high resolution, spatially interpolated Grain Size surface
–Combine with high resolution bathymetry
–Make prediction for this new high resolution surface
–Compare error rates with original and revised maps
mgs Depth

I really should have titled my project the “Multi-scale exploration of California Market Squid” to begin with, but c’est la vie! Throughout the quarter I’ve worked mostly in R, both to make interpolated spatial maps of my data as well as to run spatial analyses. Specifically, I’ve been interested in determining which oceanographic covariates are correlated with the abundances of squid at the fine scale as well as at a broader regional scale.

Fine scale exporation:

In order to explore this data, I started this quarter by plotting the distributions of market squid based on the yearly sampled (in June) fine scale spatial abundance data (see yearly plots below). For this class I chose to work with only positive abundance data (as absences may be pseudoabsences due to catchability issues and therefore potentially not good indications of poor quality habitat for the species), however I may re-introduce the 0 catch data in future analyses (with this caveat in mind).

In  the interpolated surface maps, warmer colors indicate regions of higher squid abundance and colder colors indicate regions of lower squid abundance on a yearly basis. To create thises maps, wrote a script in R to create an IDW interpolation surface of squid abundances per year with overlayed contours using R’s gstat and maptools libraries.

California market squid (catch per unit effort: #/km2 towed):

cms_JUNE

Additionally, I interpolated and contoured both the temperature and salinity data (in situ data) collected at each station in each year so as to have a better idea of the environmental context that this species was experiencing in each year.

Temperature @ 3m depth:

CTD_temp_3m_JUNE

Salinity @ 3m depth:

CTD_sal_3m_JUNE

While my initial and future plan is to look at the satellite data corresponding to each cruise (June of each year), I realized I was biting off more than I could chew this quarter by having this as a goal, as working with remotely sensed oceanographic data files (daily images for 14 years!) is remarkably difficult to access and process due to the size of the individual files as well as the multiple frequencies (wavelengths) that correspond to distinct oceanographic processes (SST, Chl, turbidity, etc). Before tackling the satellite data from a brute force kitchen sink fashion, I instead was eager to identify which in-situ environmental variable with a remotely sensed analog dataset (SST from MODIS-Aqua would be an analog to surface temperature measured with a CTD from the ship), showed strong relationships with abundance.

Below is are simple scatterplots of  the log abundances of california market squid with the points colored by the years in which they were sampled. The top row has market squid abundance in relationship to temperature, salinity and chlorophyll (all variables with a remotely sensed equivalent, or nearly), the bottom row has market squid abundance in relationship to density (function of temperature and salinity), oxygen and water column depth (all variables with no remotely sensed analog).

cms_enviros

Unfortunately, there were no strong linear or non-linear correlations of abundance with any of the in-situ covariates (R2<0.01).

Broad scale exploration:

Next, I approached this dataset from broader perspective (zooming out if you will). I was interested in tracking the centers of gravity and isotropy (a measure of the non-uniformity of a spatial dataset along multiple axes) of the sampled population through time. The center of gravity statistic is estimated from the data through discrete summations over sample locations. Practically, from sample values zi at locations xi, with areas of influence si, we have: Screen Shot 2014-06-04 at 10.48.52 AM

The inertia indicates how dispersed the population is around its center of gravity, and is given as follows:

Screen Shot 2014-06-04 at 10.49.00 AM

A more in depth discussion of this and other spatial indicators can be found in Wolliez et al. (2009). Notes on survey-based spatial indicators for monitoring fish populations. Aquat. Living Resour. 22, 155-164.

I used both of these calculate to determine the center of gravity and isotropy for market squid captured in June (and Sept, not shown on map) of each year. I then created a map with each year’s center of gravity of the market squid population in the sampled region (caveat: centers of gravity and isotropy are not for entire squid population but specifically for the data sampled off the coasts of OR and WA). There is a remarkably strong northwards shift in the centers of gravity over time for both the June data as well as the Sept. data. I haven’t gotten a chance to correlate this northwards shift in the centers of gravity with summarized environmental variables yet, but this is the plan.

cg_iso_cms

 

We’re going to look at some beach morphology data from Benson Beach, just north of the North Jetty on the Columbia River. The beach is accessible from Cape Disappointment State Park near Ilwaco, WA.

In 2010 and 2011, the USGS, OSU, WA State Department of Ecology, and the Army Corps of Engineers did a joint study of the impacts of beach nourishment. The Army Corps traditionally dumps dredging materials offshore, outside of the reach of littoral sediment transport. In the summer of 2010, they deposited these materials on high-energy, often erosive Benson Beach.

Today, we’ll do a difference analysis of beach elevations before and after the beach nourishment. Data was taken with foot- and Polaris side-by-side-borne GPS, and with jetskis equipped with GPS and sonar.

 

Start by copying the file S:\Geo599\Data\SEXTON_Difference_Tutorial into your own folder.

We’re going to start with the point data. It’s been cleaned, but otherwise has not been processed. I had some trouble with opening it, so please let me know if you have any issues – I’ve got a back-up plan.

The S01 data is the “before nourishment” baseline survey, performed on July 11, 2010. The S06 data is the “after nourishment” survey, performed September 22, 2010.

 

Open ArcMap.

In the catalog, navigate to where you saved the folder. Find the S01 folder and right click on the file: S01_20100711_topo.

Create a feature class from XY table.

Choose Field 1 for X, Field 2 for Y

Leave Z as “none” (we have Z data, but it got real mad when I told it here).

Click the Coordinate System of Input Coordinates button:

Select “Projected Coordinate Systems,” then “State Plane,” then “NAD 1983 (Meters).”

Navigate to and select the coordinate system called:

NAD_1983_StatePlane_Washington_South_FIPS_4602

Choose where to save your output feature class. I called mine: XYS01_20100711_topo.shp

Hit OK and cross your fingers – this is where your program might crash!

1

Once it’s done thinking (and maybe not responding for a bit – don’t panic if that happens, it kept working for me), navigate to the new shape file in your folder and drag it into your working area.

This should be pretty exciting, except that all the symbols are one color. To better visualize, right click on the layer, and find properties. Click the “Symbology” tab and then “Quantities” on the left. Change the “Value” to “Field3” and select a color ramp. Field three is your Z values, and it will color by elevation!

PS: when you tell it to use Field3, it gives you an angry message. I just hit OK, and it moved on and looked fine. If anyone has any suggestions, I’m all ears.

2

 

While this was super cool, I thought we’d short cut pulling in the rest of the XY data, and I’ve provided you with shape files of the rest.

In the S01 folder, there is a help folder. In there, you’ll find my version of the S01 topo shape file (which you likely just completed) and aS01 bathy shape file, called XYS01_20100711_bathy. Drag that into the work space. You can change the colors to see the elevation, but it’s not necessary to our next steps.

 

You’ll also need to drag in the S06 shape files. I still gave you the XYZ text files in the S06 folder, but again provided a “help” folder with shape files of each point set. Drag these files into your work space:

XYS06_20100922_topo.shp

XYS06_20100922_bathy.shp

You should now have four layers: two bathy, two topo; one from each survey.

Now we’re going to make some surfaces.

In the toolbox, open 3DAnalyst, then Data Management, then TIN, and find the “Create TIN” tool. (or search for it…)

Name your first TIN S01

You’ll need to choose the projection again: NAD_1983_StatePlane_Washington_South_FIPS_4602

Then choose the two S01 feature classes, bathy and topo.

When they appear in the list of input features, you’ll need to change the Height Field to Field3!

Hit OK and it will make a TIN.

3

Hopefully, it looks like this:

4

Repeat the process for the S06 data set. Don’t forget to specify the Height is in Field3.

5

Lastly, we’re going to perform a difference analysis. This will compare the baseline beach survey with the “after nourishment” beach survey and find where sand was deposited, and where erosion happened after the deposition (since the survey wasn’t directly after).

 

In the 3DAnalyst toolbox, expand the “Triangulated Surface” options, then open the “Surface Difference” tool.

Set the Input Surface to “S06

Set Reference Surface to “S01

Name your output feature, I called mine “Beach_Change”

Expand the Raster Options and name an output raster, I also called this one “beach_change”

Leave cell size as default (10)

Click OK! This might take a minute.

6

The resulting raster can be recolored to more readily show differences.

7

The feature class generated currently shows only where there sand was deposited (blue), eroded (green), or unchanged.  While this might be helpful, I wanted elevation changes in a feature class.

If you’d rather a shape file:

Open the attribute table for surface_dif1 feature class

Add a field, call it “ele_dif” and make it a “float”

Open field calculator and calculate the elevation change by multiplying volume and the code (negative means loss of sediment, positive means gain), and dividing by “SArea.” Then click okay.

8

Close the attribute table and open Properties. Choose the “Symbology” tab, then “Quantities.”

Choose the new “ele_dif” as the value, and increase the number of classes (I chose 9). When you hit okay, the feature class will show the elevation changes!

9

 

The map below displays my study area and the concentrations of naphthalene at public schools in the city of Eugene. These concentrations represent census tract centroid estimates derived from the US EPA’s National Air Toxics Assessment (NATA) for the year 2005. Also depicted on the map are sources of naphthalene emissions, such as major and minor arteriole roads as well as federally regulated industrial emissions sites. I used these estimates to develop a land use regression (LUR) model to predict naphthalene air concentrations at two air monitoring locations operated by the Lane Regional Air Protection Agency (LRAPA [depicted in the map as purple triangles]), using Ordinary Least Squares regression. Originally, my predicted concentrations from my LUR model did not account for temporal variation in concentrations nor did it account for spatial autocorrelation of air pollutants. Therefore, my goal in this class was to figure out a method that may be able to account for temporal and spatial dependency in my LUR model. The following text summarizes my progress in this effort towards developing a spatiotemporal LUR model.

GEO580_StudyArea

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The graph below clearly portrays the wide variation in naphthalene air concentrations by month, as measured by LRAPA at the Bethel and Amazon Park air monitoring sites. The flat lines represent the naphthalene air concentrations averaged over the year at the two air monitoring sites. As one approach in this project to account for temporality, I calculated the average concentration between the two sites, then I chose to calculate the ratio of a given months concentration over the annualized average concentration for each site. This ratio was then applied to the NATA estimates for the schools in order to derive month-specific concentrations. These month specific concentrations were then annualized by calculating the average over all months. These new annualized concentrations are displayed in the next graph below.

seasons

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

As you can see, the temporally adjusted annual concentrations for the school sites shifted all the values upward in concentration. These new “seasonalized” estimates were then used as data inputs to develop what I denote as a “Temporal” LUR.

seas.adj

 

I next ran the LUR model separately in R using the temporal data and the non-temporal data. I then developed a spatial weights matrix in R, by Eugene zip codes, to use as my spatial constraint in a spatial conditional autoregressive (CAR) model. I denote as the “Spatial Only” LUR model as the model using the non-temporal data, and I denote the “Spatiotemporal” LUR model using the temporal data. Each model was run separately to predict naphthalene concentrations at the LRAPA air monitoring locations. The color-coded table below compares the respective models by their respective percent difference between the predicted concentrations and the LRAPA observed concentrations. In terms of percent difference, and on an individual site basis, the Temporal Only model was improved over the Non-Spatial/Non-Temporal model, while the Spatial Only model was also improved compared to the Non-Spatial/Non-Temporal model. The Spatiotemporal model was improved for both sites compared to the Spatial Only model for both sites. While the Spatiotemporal model was improved over the Temporal Only model for the Bethel site, but not for the Amazon site.

mod.compare

 

 

 

 

 

 

 

 

 

I next averaged the percent differences over the two sites for each model in order to derive an aggregate percent difference measure. The graph below clearly depicts the superiority of the Spatiotemporal and the Temporal model over the other two models. On average, the Spatiotemporal model performed the best. While all models tended to under-predict concentrations, on average, these results suggest that factoring in more than just GIS variables in an Ordinary Least Squares regression is desirable when developing a LUR model to predict air concentrations.

avg.compare

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The above results are intriguing and provide guidance for next steps. I decided to move forward and fit the temporal data in a GAM model in R. The graph below displays the model fit with standard errors around the GAM regression line. The seasonal variation in naphthalene concentrations, not surprisingly, is characterized by a non-linear relationship. This GAM model could be used to develop a more sophisticated spatiotemporal LUR model that is capable of predicting monthly naphthalene air concentrations. Prediction of monthly concentrations would be useful in environmental health epidemiology studies that aim to examine temporal relationships between high air pollution and health effects.

gam

 

I started the term with some lofty ideas of learning to process LiDAR data in Arc. Throughout the term, I’ve had many trials and tribulations, many as trivial as my inability to open the data I was interested in processing. While the term has felt difficult and trying, as I have written this blog (3 times now, due to wordpress…), I’ve concluded that I’ve actually accomplished a lot.

Realistically, I started this course with very little knowledge of data processing in Arc (like none at all). I was pretty good at making things look pretty and making informative maps of the places we would next be mapping. When it came to actually pulling in raw data and making sense of it, I probably wasn’t at an “advanced spatial statistics” level. So what I took from the course was a whole bunch of rudimentary skills I hadn’t known I didn’t know.

I pulled data from the group I worked with for two years: the Coastal Monitoring and Analysis Program at WA Dept of Ecology. They’re really good at footbourne GPS data taking, walking profiles and using jetskis and sonar. While I was there, we acquired a laser to perform LiDAR scans, a multibeam sonar to map bathymetry, and an inertial measurement unit to tie it all together on a constantly moving boat. Many of the dead ends I encountered related to the never ending battle to synthesize these very complex systems together. We’re still working on it; you’ll see what I mean.

So I started with a simple scan. One pass of the laser on a beach where the team had put out retired Dept of Transportation signs as possible targets.

1

Circled in yellow are the plywood targets made while I was working there, and used in many scans. Turns out they’re not very reflective. In red are two stop signs, the one on the left is lightly painted with white paint and then made into a checker board with some tar paper. The one on the right was untouched.

The goal here is to view intensity data. When the laser receives a return from one of its laser pulses, it records the intensity of the light being returned. White or bright colors reflect more light back, whereas dark colors absorb some of the light. This means intensity values can almost function like a photo. We can use those to pick out the center of the targets in the scan and ground truth with GPS positions taken from on land. Then we’ll feel more confident with our boat’s scanning accuracy. Right now these targets are just serving to make us 100% sure that our boat data is wrong.

Below is a profile-ish view of the target scan, showing intensity. The same things are circled as above. You’ll note the old targets are really hard to distinguish.  Also notice the black, low intensity returns at the edge of the stop signs with the high intensity white in the center – this is what we were going for!

2

Getting this view was really exciting. But as I played with the 3D views in Scene, I discovered something awful with the DOT signs (it was actually really exciting, but awful for those who wanted these targets to work). While the ring of low intensity returns from the edge of the stop sign, the high intensity center returns are not showing at the face of the sign, where they should be. Instead, they are scattered behind the sign. You can see this in the plan-ish view below:

3

With some help from classmates, we discovered that this is due to the stuff they use to get the high reflectivity. When light hits these types of signs from any angle, it scatters so that anyone looking at the front of the sign (even without a light source) can see them. This means that the laser’s assumption that the light was hitting the sign and coming straight back was not valid; in fact, the light was coming back to the laser at angles and taking a longer time than it should have. What is still unexplained is why there appears to be no returns actually on the face of the sign.

Ultimately, the unpainted versions of DOT signs are not going to work for our purposes. But I thought I’d do a last bit of educating myself on what I would do if I had good data to work with. I imported the GPS points taken at the center of each target into Scene, where they could display in 3D. It was easy to see that, beyond the bad target reflectivity, we also have a problem with our system’s calibration. The two points from the plywood targets are circled in purple. Despite the challenge in picking out the center of these targets, it’s obvious the points do not agree.

4

My ultimate goal with other scans would be to quantify sediment movement over time by subtracting two surfaces. Although I don’t need to monitor this beach, the scan was one of my smaller ones, so I used it to learn to make a TIN.

The TIN method, shown in my tutorial, is not intended to account for large data shadows that result from performing horizontal LiDAR (as opposed to airborne LiDAR). This means that it doesn’t like to leave spots uninterpolated and instead creates a triangle over areas of no data. Thus, if we drove by with the boat, scanning, and got a few returns off of a tree or house 500m behind the beach of interest, Arc’s TIN interpolation would create a surface connecting the beach data to the far off tree/house data, and most of that surface would be silly. My method for dealing with this issue was to delete any of this far-off data, since our region of interest (ROI) is the beach.

Not surprisingly, this was a challenge too. You cannot delete points in an LASD file in Arc. After many trials, converting it to Multipoint was the best option. This process worked for this small scan, but not for larger data sets. After it was converted to multipoint, I could click areas and delete sections of data. I could not delete individual points, but for my learning purposes, I decided that didn’t matter. I removed the backdune data and as many of the pilings as possible. I used the “Create TIN” tool and came up with this surface.

5

Once again, it served to highlight where we had some error in our collection system. The discontinuities on the beach helped us pinpoint where our problems are coming from.  Each of the discontinuities occurs where the laser did more than a single pass over the landscape – it either scanned back over where it had already been, or sat and bounced around (this happens, we’re on a boat). If our boresight angle (the angle we’re telling the laser it is looking, relative to the GPS equipment) were correct, it should line up when we scan from different angles.

Throughout the term, I came back to a scan we did on Wing Point, on Bainbridge Island, WA. I had mild success with processing the scan: I was able to open it in both ArcMap and Scene. However, most other attempts failed. Creating a TIN caused the system to crash, but also would have connected many unrelated points (like the ones separated by water and the shorelines without data in-between). After success with multipoint with the small scan, I tried that route, but the scan appears to be too large. I am hopeful that if I have them trim the data down in another program (the laser’s proprietary program), then I could do a difference analysis when the laser’s communication is perfected and the scans line up properly. Below is a shot of the wing point data!

6

I learned so much throughout this course. I reprocessed a bunch of data from Benson Beach, WA in preparation for my tutorial. Since this is lengthy already, I am posting the tutorial separately. Obviously, I ran into a bunch of data difficulties, but I feel infinitely more confident working with Arc programs than I did in March. I know that I have tools that COULD be used to process this type of data, if the data provided were properly cleaned and referenced. Arc may not be the optimal tool for processing horizontally collected LiDAR, the tools I learned can be applied to much of the sediment transport and beach change topics I’m interested in.

So this is the third time I am writing this, as the WordPress site continues to delete my work even after saving a draft!

As you may know I have been working on Envision outputs, trying to compare resulting datasets in terms of property values and expected flooding and erosion impacts over the next 90 years in Rockaway Beach, Oregon. My original dataset was split into Individual Decision Units (IDUs) with varying sizes and values.

11a

As we’ve seen in my tutorial, small scaled interpolated data can then be upscaled to larger parcels, in this case tax lots. Using the Areal Interpolation tool I was able to create the following surface from my known points. It follows the general trends of the property values fairly well even if the data only moderately fits the model.

2

After hearing from that class that my data would be better transformed, I set my Values per SqFt to a log scale. The resulting surface was more clear, however, there are discrepancies between the known values for properties and their interpolated values. I have yet to figure out why this occurs, but I’m hesitant to use the newly created tax lot polygon values until I figure it out.

3

In the meantime I have decided to compare flooding and erosion results data from several time series: 2010, 2040, 2060, and 2100. To make these comparisons, I read up on and chose the tool Feature Compare which compares shapefiles based on an attribute or geometry. The tool needs parsed down data, so I created new shapefiles of properties solely affected by flooding or erosion through a series of definition queries and the export of new shapefiles. Once in the tool though, I started running into problems as it recognized differences in the attribute table (i.e. the number of rows0 but did nothing to note the differences on the map in coverage. I’ve been looking for a tool that would fit my needs elsewhere but have yet to come across it.

Instead I moved onto comparing hot spot locations to these areas of flooding and erosion. The hotspot analysis tool is pretty straight forward, and creates a new shapefile of GiZscores which rank the property values based on their standard deviations away from mean. The hot colors denote greater positive standard deviations.

4 4a

Again, I used the intersect tool to find the areas of greatest property value and greatest extent of damage. I first intersected the erosion and flooding data over the course of the four times to find areas which are continually under “attack”.

Over the course of 90 years, only one subdivided tax lot was repeatable impacted by erosion. According to the Assessed Values for those properties, the three tax lots were at approximately $97,000. This kind of analysis could help determine if protection structures, such as revetments or beach nourishment, would be economically feasible. Similarly, flooding impacts a total of 44 tax lots over the course of 90 years.  The total assessed values for the properties impacted is much higher, at about $3.1 million.

5 5a

This is mostly due to the occurrence of flooding in the “hotspot” of Rockaway Beach. If only these “hotspot” areas (greater than one standard deviation away from the mean) only 16 properties are affected with a total loss of about $2.3million, or almost 75% of the $3.1 million total.

6 6a

Overall, I’m not sure ArcGIS10.1 has the tools I need to perform the analysis I described above. As I become more comfortable with R, I hope to pull these datasets into the program and perform more statistical analysis. In the meantime, I will continue trying different ways to present the data and glean new insights.

Here is a simple tutorial in ArcGIS on using spatial statistics tools. Not much new here but I found it useful and interesting because it uses the tools to help inform analysis with other tools in the toolset.