Eating locally grown food can reduce the carbon-footprint of that food by reducing fuel-miles.  Spending locally on food also has important supporting effects on the local economy.  Currently the dominance of trucking and shipping freight industries allow large agricultural areas to centralize the production of food and deliver this food to a wide network of markets.  Local food networks have declined precipitously, to the point where a supermarket is more likely to carry tropical fruit out of season than local produce in season.

Local food networks have shifted in shape and character as they persist in an increasingly global marketplace.  Farmers Markets allow small local farms to sell their produce retail and direct-to-consumer.  Grocer’s co-ops and high-end grocery stores continue to stock their produce.  Organic wholesale distributors sell their produce to restaurants.  CSAs also meet growing demand by allowing consumers to buy shares of a farm’s produce, to be delivered direct-to-consumer on a weekly basis.

There is a lack of scholarly data and research on the character and function of local food distribution networks.  The most challenging part of the analysis is gathering the data.  This project examines the spatial distribution of farms serving the Corvallis Farmer’s Market.  For each vendor in the Corvallis Farmer’s Market list of vendors, I searched the farm name and address and used Google Maps to estimate the location of each farm.  Google Maps does not identify rural addresses precisely enough to associate with specific plots of farmland, so estimates of farm size come from survey data.

Survey Depth vs. Breadth

I want covariate information on these farms that a web search will not provide for all farms, so I knew early into the analysis that I wanted to do a survey.  To get a decent response rate, I wanted a month to gather responses, so I could push out a few reminder waves.  To meet this deadline, the list of farms to survey would have to be complete within a month.  The result is a dataset that is small in size, but contains sufficient covariate data to fit a generalized linear model in R.

There are several good options to improve the breadth of the survey: examining vendor lists for Salem Saturday Market and Portland PSU Farmer’s Market, contacting local groceries for a list of their local vendors, and contacting Organic wholesale providers for a list a local vendors.  Among the list of farms in the dataset, many more have phone numbers than emails, so a phone survey could potentially improve the number of responses from the current list.

Farms Surveyed                                                     vs.                                                             Responding Farms

geo599_farmssurveyedgeo599_nb_fit5

Key Variables

For each farm, I gathered data on the following variables:

  • Miles – Distance from Corvallis Farmer’s Market in miles.
  • Acres – Size of Farm in Acres
  • Age – Years Under Current Management
  • Age2 – Squared value of Age term
  • Local – Proportion of Sales to Local Markets

Miles – Distance from Corvallis Farmer’s Market

The location of each farm is an estimate based on the address placement market in Google Maps.  In rural neighborhoods, the marker frequently falls in the middle of the road and does not clearly correspond to a specific plot area.  Some farms also use as a business address an urban location that clearly does not produce the food on location.

I estimated distance from the farms to the Corvallis Farmer’s Market in ArcGIS using the Analysis > Proximity > Point Distance tool in the Toolbox.

In retrospect I am not convinced that distance “as the bird flies” is a good representation of spatial auto-correlation between producers and markets.  Fish do no swim as the bird flies from spawning grounds to the oceans, but rather travel along stream networks.  Likewise, traffic follows street and highway networks.  Estimating distance from farm to market by measuring likely traffic routes may be a better way to measure auto-correlation across spatial distances.

Surveying Farms

I used an email survey to gather covariate information on farms, including farm age, size in acres, and percent of local sales.  Out of 47 active emails on the list of Corvallis Farmer’s Market vendors, 30 responded to the survey, a 64% response rate.  An additional 63 farms have telephone contact information, but not email contact info, so the response number is only 27% of total vendors, and a round of phone surveys would help the sample to better represent the target population.

The Corvallis Farmer’s Market listed 128 vendors.  Salem Saturday Market lists 284 vendors, and Portland PSU Farmer’s Market lists 163 vendors.  In addition, the Natural Choice Directory of Willamette Valley lists 12 wholesale distributors likely to carry produce from local farms.  At similar response rates, surveying these farms would result in a sizeable dataset for analysis, and would be a fruitful subject for future study.  Surveying the addition Farmer’s Market would yield 120 responses at the same response rate, not counting the wholesale distributors, but there would likely be repeat farms in each vendor list.

The variable for proportion of food going to local markets was problematic.  Defining what markets qualify as local is difficult, because small differences in distance can be significant to smaller farms, while a larger farm may consider all in-state sales to be local.  There is no single appropriate threshold to describe local sales common to the variety of farms in the survey, so responses to the survey employed different definitions of local and are not comparable to each other.  Therefore I removed this variable from the model formula and do not use it in future analysis.

Analyzing Spatial Data in R

There are multiple reasons to prefer analyzing spatial data in R.  Perhaps the researcher is more familiar with R, like myself.  Perhaps the researcher needs to employ a generalized linear model (GLM, GAM or GAMM), or a mixed effects model, for data that is not normally distributed.  While ArcGIS has several useful tools for spatial statistics, its statistical tools cannot match the variety and flexibility of R and its growing collection of user-developed packages.  My particular interest in handing GIS data in R is to use the social network statistical packages in R to analyze food networks.  This dataset on farms does not include network data, but this project brings me one step closer to this kind of analysis, and future surveys could expand the current farm dataset to include network elements important to my research.

The following diagram shows a generalized workflow for transferring spatial data between ArcGIS and R:

geo599_workflow

Whereas several R packages allow the user to manipulate spatial data, the most popular package is called ‘maptools’.  I recommend using this package to open the spatial data file in R, which will import the file in a format similar to a dataframe.  The spatial data object differs from a dataframe in several important ways, whereas most statistical operations are designed to work on dataframes in particular.  Therefore, I recommend converting the imported object to a dataframe using the R function as.data.frame().   Perform all statistical analyses on the new dataframe, then save significant results to the initial spatial data object, and export the appended spatial data object to a new shapefile using the maptools command writeSpatialShape().

For farms serving the Corvallis Farmer’s Market, I suspected there was a relationship between key variables in the survey and distance from the farmer’s market, but I did have a specific causal theory, so I relied on basic data analysis to inform my model.  The relationship between the key variables was uncertain, as shown in this diagram:

geo599_modelvars

First I fit three different models, each using a different variable as the dependent variable in the model.  Then I examined the fit of each model to determine if one model best explained the data.  The following three charts show the tests for normality of distribution for residual error for each model.  With Acres and Age as the dependent variable, the model fit showed heavy tails on the Q-Q plots, whereas Miles as the dependent variable had relatively normal distribution of errors.

geo599_qqplots

The tests of residuals vs. fitted values showed a possibly curvilinear relationship with Acres as the dependent variable, with residual error seeming to increase as acre size increases, then to decrease as acre size increases further.  This curvilinear relationship was even more explicit with Age as the dependent variable, meaning this model clearly violates the assumption of errors being independent of the values of the dependent variable.  With miles as the dependent variable, the residual errors appear to satisfy the assumption of independence, as they appear independently distributed around zero regardless of the how far the farm was from the market.

geo599_rvfplots

In retrospect, these results make sense.  This analysis assumes there is a spatial auto-correlation between farms, that farms nearer the Corvallis Farmer’s Market are going to be more similar to each other than to farms further from the market.  Conceptually, the best way to fit this auto-correlation to a model is to use Miles from the Farmer’s Market as the outcome variable.  This is sufficient a priori reason to select this model, but the model fitting Miles as the dependent variable also best meets the model assumptions of independence and normality of errors.  Since I dropped Local from the variable list, then in the selected model Miles is dependent on farm size in acres, age in years under current management, and the squared age term.

One can save the residuals and fitted values for each observation directly to a new shapefile, but the coefficient results for variable terms require a new file.  To save a new table in a format that can import cleanly into ArcGIS, use the ‘foreign’ package in R and call the write.dbf() function to save a dataframe into DBF format, a table format that will open cleanly in ArcGIS.  The following table shows the coefficient results for the farms in the dataset, using a negative binomial regression:

geo599_coefs

After performing the analysis in R, I used the maptools function write.Spatial.Shape() to create a new shapefile containing the results of the analysis in its data fields.  This shapefile does not contain any significant results or insights that I am aware of.  The color of the diamonds signify which farms are closer or farther away from the market based on the covariate data, but because the sample size is so small, the only outliers in the set are either very close or very far from the market relative to the other samples.  This is an indication of data deficiency.  Here is the final map of results:

geo599_nb_fit5

However, in spite of failure to obtain initial significant results, I believe this research has real potential to shed light on the nature of local food networks in the Willamette Valley.  The farm data gathered for this research is original data, not extant data from a government survey.  This branch of inquiry can shed light on a part of the food system that is poorly studied and data deficient.  Now that I have a data management framework for moving data from ArcGIS to R for analysis and back, I can tackle more complicated analyses, and spending additional time collecting farm observations for the dataset becomes a worthwhile endeavor.

If I sample data from additional Farmer’s Markets in Salem and Portland, I can determine if the spatial distribution of farms is significantly different between cities, or if the farms serving each city share common attributes.  More importantly, sampling additional Farmer’s Markets, and other distribution networks like grocers and wholesale distributors means that some farms in the dataset will serve multiple markets, multiple Farmer’s Markets or possibly also grocers and restaurants.  The number of different markets a farm serves could be a significant variable in regression analysis.  Connectivity to additional local food markets is network data, so I could use network analysis to determine if connectivity was a significant factor in relation to farm size or age.

There is a natural tension as I gather data between questions that are useful for analysis and questions that are realistic to answer in the scope of a survey.  I achieved a 64% response rate by limiting the number of questions I asked to only three.  The fewer the questions, the less invasive the questions, the higher the response rate.  If I asked for nearly as much info as I desired, my response rate would have dropped to less than 1%.  Every variable I add to the survey reduces the odds of response.  So developing this dataset requires careful consideration of each additional variable, determining if the benefit to analysis justifies the imposition upon the survey participant.  While I feel that there remains room for additional variables in the analysis, I am still looking for candidate variables that justify the increased transactions costs of their inclusion.

This project has been an enjoyable analysis of a topic in which I am very interested.  I hope I can get the chance to continue pursuing this research.

 

 

 

 

 

 

 

I would like to show you how to use a script tool within the ArcGIS toolbox to call R remotely, have it perform some function on your data, and return the results into a new shapefile projected onto your workspace in ArcMap.  I won’t be doing that.

The scripting tool works.  I have used it myself.  But it does not work in the computer lab at school.  The problem has to do with how R handles packages.  When installing a package at school, R complains that students do not have permission to modify its main library, and automatically places new packages into a user-specific library.  R does not have any trouble finding this user library when running on its own, but when calling R inside ArcMap it nonsensically forgets where the user-library is located on the S: drive.  Considering all spatial statistics in R requires user-installed packages like maptools, there is no point to calling R and asking it to do anything if it can’t find its own library.  So instead I am going to give a plain vanilla tutorial on how to open a shapefile in R, lay some statistics smackdown on it, and save the results in a new shapefile and some tables you can open in ArcMap.  You might want to grab a cup of coffee, this is going to be boring.

If, for some reason, you actually want to know how to call R remotely using a python script tool within ArcGIS, I will leave a trail of breadcrumbs for you at the end of the tutorial.  It is not that hard.

First, locate the folder “demo_geo599” within Geo599/Data/ and copy the folder to your student folder.

Open the program R, press Ctrl + O and navigate to your copy of the demo_geo599 folder.  Open the file “demo_nb_GLM.r”.

Install the packages required for this demo using the code at the top of the file.  Once the packages are installed, you still need to load them.  Note that you can load packages using either the command “library” or “require”, why I don’t know.

The next step to getting this demo working requires you to change a line of code.  Where the file says “Set Working Directory”, below it you will find the command “setwd()”.  Inside those braces you need to replace the file path with the path to the demo folder in your student folder on the S: drive.  R interprets the file path as a string, so be sure to surround the file path with single or double quotation marks, which is how you declare a string in R.  By running this command, you direct all output to this file path, and you can manipulate file names within this path without specifying the

The next line of code uses the package maptools to import the shapefile using the command “readShapePoints()”.  This line of code will run so long as your file path specifies the demo folder.  The shapefile “Local_Farms.shp” contains covariate data on some of the farms serving Corvallis Farmer’s Market, the ones that had responded to my survey at the time of making this demo.  Examine the file contents using the “head()” command, and note that the shapefile data resembles an R data.frame.  Although it includes a few additional properties, in many ways you can manipulate the shapefile data the same way you would a data.frame, which is how we will add the results of our regression to the data and output a new shapefile.

I adapted this file from the script that python calls, as I mentioned above, so this next bit of code is a legacy from adapting that file.  There are easier ways to specify a formula.  In this case, I specify the names of the variables as strings, and then paste them together into a formula.  Formulas in R follow a syntax:  y ~ x1 + x2 + x3.   Before we fit the model, we are going to perform some initial diagnostics.  The package GGally include a function called “ggpairs()”.  This function makes a scatterplot of every variable in a data.frame against every other variable and presents the results in the lower triangle of a matrix.  In the upper triangle it prints the correlation between the two variables.  If you run the command alone, these results will appear in the graphic window.  Above the line is the command you use to print the graph as a portable network graphic (.png), and the command “dev.off()” tells R that you are done specifying the contents of the .png file.  While you likely have no interest in keeping a pretty picture of the output, this format is essentially when you are calling R remotely from ArcGIS.  This is because the R script runs in full and exits within the python script and you can’t see any of the graphs.  So after I run the file remotely in ArcGIS, I open the working directory and examine the graphs.  From the perspective of diagnostics, it is better to work in R directly.  If you need to change the scale of the response variable or, you know, respond to the diagnostics in any way, you can’t do that when running the script remotely in ArcGIS.

nb_cov_test

To make sure that we only compare variables of interest, I preface the ggpairs() call with a few lines of code that build a data.frame containing only the variables we want to compare.  The line “#var_tab$Acreage <- as.integer(var_tab$Acreage)” begins with a #, which is how you specify comments in R.  That means this line is “commented out”.  We will be running a negative binomial regression, and R assumes you will provide count data as the dependent variable.  The regression still operates with float data, but R will issue warnings, which you can safely ignore.  If you do not want to see warnings, simply round off all the decimals by uncommenting this line and running it.

Note that I have commented out another line in the script: #ct <- qt(.95, df=(nrow(var_tab) – length(independentVars))).  I didn’t end up using this because the package I used to report coefficient results did not allow me to specify different confidence intervals using critical values, but this is too bad.  I do not have enough farms to justify a .95 probability and should be using a t distribution to determine confidence intervals.

You may be wondering: why negative binomial regression?  Well, “number of acres” is essentially count data.  I have no reason for assuming it to be normally distributed around any particular mean acreage.  In fact, I expect smaller farms to be more common than larger, so it is not unreasonable to hypothesize that the distribution of farms would match a poisson or negative binomial distribution.  Turns out, the poisson regression was over-dispersed, so the negative binomial is the appropriate distribution in this case.  It seems to me that the poisson is always over-dispersed, and I am getting into the habit of skipping straight to negative binomial.  For an interesting analysis of the suitability of different distributions for analyzing ecological data, I recommend a paper by Walter Stroup called “Rethinking the Analysis of Non-Normal Data in Plant and Soil Science”.

Go ahead and run the regression and examine the results using the command “summary()”.  Note that the spatial component of my analysis “miles”, which represents the distance of the farm from the Corvallis Farmer’s Market in miles, is not significant.  This is a shame because I hypothesized that farms would become smaller the closer they were to the market.  In particular, I believe that there is a new business model for farms emerging based on small acreages producing high quality food intended for direct-to-consumer local markets.  These results show that older farms tend to have larger acreage, and the higher proportion of their production that goes to local markets, the smaller the acreage tends to be.

nb_coefs

The package texreg is for formatting model results into the LaTeX language, which many academics use for publishing because it specializes in formatting model formulas and symbols.  The function “plotreg()” makes an attractive graphic of the coefficient results.  The next few lines of code simply print a Q-Q plot and a plot of the residual vs. fitted values, so we can test the model for normality and independence.  The errors are reasonably normally distributed.  You can see in the residuals vs. fitted plot that most of the farms are very small in size, but the errors seem independent of the farm size.  Hard to tell with so few data points!

nb_norm_test nb_indep_test

Next we append the residuals and how many standard errors each residual deviates from the mean to the shapefile data.  This is the same syntax used to add data to a data.frame.  The function writeSpatialShape() resides in the maptools package and does exactly what it sounds like, depositing the new shapefile in the working directory.  Then, since we cannot see the output from R when calling the script remotely in ArcGIS, the code prints the results in a .dbf file.  The function write.dbf() is in the package foreign.

Information criteria are all the rage now, so the final part of the code makes a .dbf table that includes every possible combination of independent variables included in the original model fit against the dependent variable, with the resulting AIC and BIC.  Unless you have an abiding fascination with for and if loops, you can safely ignore the algorithm I use to permutate through every unique combination of independent variables.  It stores all the combinations as vectors of numbers in a list.  The code after it connects the variable names to the numbers and then pastes them together into formulas.  This part of the code is only interesting if you work with list type data and want to know how to extract and manipulate data contained in lists.  Hint: use the command “unlist()”.

Use write.dbf again to produce a table you can load in ArcGIS, and feel free to run the line containing just the name of the table “diag_tab” to view the diagnostics table in R.diag_tab

This concludes a poor tutorial in R.

If you are interested in how ArcGIS and R can talk to each other, here are a few tips to get you started.

I found this video by Mark Janikas describing a python script that calls R to perform a logit regression and return the results to a shapefile in ArcGIS.  The video is not super helpful in explaining how this process works, its intended audience is a conference full of pro programmers who are hip to hacking lingo, which is not me.  At the end, however, he refers the interested to this github site that hosts the python and r scripts.

One important note: certain ArcGIS directories have to be added to the search path for Python.  You can manually add these directories to the search path by modifying the path variable in the environmental variables according to the instructions in the folder demo_geo599/R_Tools/Doc.  However, I did this and it did not work, so I used a workaround.  I manually add the paths using the code in the python file “pathlib.py” which I include in the demo folder.  This file and the file “arcpyWithR.py” need to be in the python search path somewhere for my example file to work.  And yes it will work on your home machine, just not in the lab.  I have added a python file and R file titled “GLMwithR” to the folder R_Tools/Scripts.

If you open the python file, you will note the first line reads “import pathlib”, which adds the necessary directories to the search path.  Scrolling down, you will see the function “ARCPY.GetParameterAsText()” called six times.  Each of these commands receives user input through the ArcGIS GUI interface.  You can add the toolbox inside the R_Tools folder to your ArcGIS toolbox by right-clicking on the directory in the toolbox, selecting “Add Toolbox…” and navigating to the “R Tools” toolbox in the R_Tools folder.  I added my script to this toolbox by right-clicking on the box and selected “Add -> Script…”.  I would refer you to the help file on creating script tools.  The hardest part is actually configuring the parameters.  If you right-click on the “Negative Binomial GLM” tool and select “Properties”, you can view the parameters by selecting the “Parameter” tab.

parameters

Note that there are six parameters.  These parameters need to be specified in the same order as they appear on the python script, and each parameter needs to be configured properly.  In the Property field, you have to select the proper values for Type, MuliValue, Obtained from, etc., or the script will fail to work.  It’s pretty annoying.

I don’t understand all the python code myself, but the middle portion passes all the inputs to R in the list called “args”.  Feel free to open the r script “GLMwithR.r” in the folder R_Tools/Scripts and note the differences between it and the tutorial file we just worked through.

Python uses the functions “ARCPY.gp.GetParameterInfo()” and “ARCPY.SetParameterAsText()” to read the new shapefile and the two .dbf tables respectively that come from the R script.  The whole process of sending info in between Python and R is still mysterious to me, so I can’t really shed any light on how or why the python script works.

That concludes my trail of breadcrumbs.  Good luck!

 

 

Having finally collected a full list of farms serving the Corvallis Farmer’s Market, I have a fork in the road ahead as to what kind of data I shall collect, and what type of spatial problem I shall analyze.  On Saturday I sent out a survey to the email accounts I had among the farms on my list, about 50 out of the 114 farms on my list.  As of today, 18 have responded.  That is a high rate of response for one email.  One trend I have noticed is that over the past decade a new business model is emerging: small acreage, high quality, sustainably grown produce intended for local markets.  I could continue down this path by surveying the remaining farms over the phone and with additional rounds of emails.  My question would be whether farms nearer to Corvallis tend to be newer in ownership and smaller in size at a statistically significant level, indicating that Corvallis as an urban center is supporting and driving this new business model.  I would continue to collect information on local farms through the First Alternative Co-op and through local wholesale distributors.

An alternative strategy is to consider the Farmer’s Markets of other cities in the Willamette Valley.  Salem and Portland both have very detailed information on their vendors, and Eugene has somewhat fewer, about 50 farms total listed.  Then I can ask whether the size of the city, determined by the area of the city limits polygon data, is predictive of the number of farms serving their Farmer’s Market, or if some cities have more Farmer’s Market relative to others, after considering the relative size of the urban centers.

The Extent of the Problem

To those of you who struggled to help me this last week with a “bug” in my points layer, I am happy to announce a fix for the problem.  The “extent” setting in my Geoprocessing Environment was set – why I don’t know – to an area much smaller than the file on which I was working.  I changed the extent to cover the full area of all layers in the file, and suddenly the excluded points in my point layer started responding to field calculations and tool actions.  What a pain!  Glad to have it over with.

Python R U Talking?

Some of you also know I have been wrestling with a Python script intended to take GIS info, call R to perform some function on it, and then import the results back into ArcGIS.  Well I finally got it to work, but it wasn’t easy.  The R Toolbox that Mark Janakis made had some bugs in both the Python script and the R script, at least the way it worked for my version of R, and debugging was trial and error.  But I am sure you will hear about it in my Tutorial later.  Peace!

 

So far I have identified over 70 local farms providing food to the Corvallis Farmer’s Market.  While many of the farms are far-flung, there is a definite clustering affect around the city of Corvallis.  This map shows the whole map, and keep in mind I am still collecting tiles because they are farm locations outside this area I cannot place yet.  The purple ellipse comes from the Directional Distribution tool, and it shows the area containing 68% of the local farms, or containing 1 standard deviation.  I traced the city limits for Corvallis, Albany and surrounding cities with a city limits shapefile in light blue.  Farms that sell at the local Farmer’s Market are represented by gold stars.  Note that the ellipse skews to the right of Corvallis, and is longer from the north to the south.  Essentially, the ellipse is following the contour of the Willamette Valley, which we would expect.

wv7

The purple cross is the mean center of distribution of local farms, which is also the center of the ellipse.  But the orange triangle is the median center of farm distribution.  The median center moves quite a bit towards Corvallis, implying that remote geographic outliers influence the mean, and that perhaps farms cluster more strongly around the city of Corvallis than the distribution ellipse and mean center suggest.

wv8

There remain more than two dozen farms on the Corvallis Farmer’s Market list that I have yet to add to the dataset.  After that, I would like to know the approximate acreage of each farm.  This would allow me to do a hotspot analysis around a specific question.  I have a theory that farms near Corvallis are likelier to be smaller, and that being near an urban development makes it more feasible to grow high quality produce on small acreage as a business model.  To put it another way, Corvallis acts like a market driver that spurs and creates local sustainable development nearby.  I could test for this using a hotspot analysis if I had an acreage estimation for each farm.

Identifying the location for each farm remains tedious, but each farm has a phone number associated with it, and many have an email address.  An initial email survey followed by phone calls could yield information about the size of each farm, how long it has been selling locally, and how much of its produce goes to local markets.

There is definitely error in the accuracy of the farm locations.  Google maps does a poor job of identifying the location of farm addresses, so I am sure some stars are nearby, but not on the right farm.  It is also hard to determine boundaries of ownership by visual assessment, and so a polygon shape file estimating farm areas would be even less accurate.  While I can use tax lot information for Benton County to determine farm area, Lynn County is much more difficult to access, and the farms are flung through many counties, so the process is time-consuming.  A combination of ground-truthing and surveying would be necessary to improve accuracy to a publishable level.  I have also not addressed farms selling to local groceries like the First Alternative, to local restaurants through wholesale distributors, or through CSAs, all significant contributors to the local food system.

My initial goal was to explore local food production over time near Corvallis, but I am getting ready to change topics because I cannot find enough information on farms in the area to discriminate crop types, either by visual assessment or ownership.  The federal data I could find on crop types did not list information more granular than the county level.  Land cover data categorizes farmland as “herbaceous” or “barren” and is not much help.  So I attempted visual assessment of orthographic imagery.  Here is the Willamette Valley around Corvallis:

wv1

If I zoom in on a parcel, this is the level of detail:

wv2

Clearly agricultural, but I couldn’t tell you what.  That was 2011, here is the same land in 2005:

wv3

Is that supposed to be grass?  What degree of certainty do I have?  Not enough for analysis.

Here is the adjacent parcel:

wv4

Clearly two different crop types, but is one hay and the other grass seed?  Don’t ask a city slicker like me.

The second strategy I tried was to determine ownership.  Certain farms produce specific types of crops, and other farms have a reputation for selling their food locally.  But I could not find the equivalent of the White and Yellow pages for GIS, or even a shapefile with polygons divided by tax lots.  Instead, I tried looking at the water rights.  Water rights data identifies the owner in a set of point data, and also displays a polygon associated with each right, showing the area of farmland using that right.  I selected only water rights related to agriculture, so municipal and industrial water rights would not show up in the layer.  Here is a closeup of water rights data layered on top of the orthographic data:

wv5

The water right for the parcel in the center on the right belongs to Donald G Hector for the purpose of irrigation.  An article in the Gazette-Times records the passing of Donald’s wife in 2004 from Alzheimer’s after being married to Donald for 53 years.  Businessprofiles.com lists the farm as currently inactive.  Other than that, I could not find much about Mr. Hector or his farm.

There is a more significant problem with using water rights data to determine farm ownership, which you might intuit from the picture above.  There are many parcels of land that are not associated with water rights.  In fact, only around 15% of Oregon’s crops are irrigated crops.  Once I zoom out, this becomes obvious:

wv6

The large blue area at the bottom left is the Greenberry Irrigation District, meaning a utility maintains the local irrigation infrastructure, and taxes farmers individually.

When I was interning at the WSDA, they had enough data to construct a map of the kind of information I want, but they could not publicize it because of privacy concerns, and I think that is the problem I am running into here.  I need some NSA style access.

Or a new spatial problem!