I came into this quarter having not yet experienced Dr. Jones’ spatial statistics course (plan to enroll next fall!) nor having any data yet collected for my thesis. Still, it sounded like a great class concept and I thought I’d use the opportunity to explore an interesting set of data, a legacy database of soil pedon measurements from the NCSS:
The data wasn’t in a form I could run in ArcMap; it was stuck in an Access database rather than being a raster or shapefile. But, no worries, right? I figured by about week 3 I could extract something from the dataset worth playing around with (I was also working on this data set for GEO599, Big Data and Spatial Modeling). Well, you know how things take three times as long as you think they will? It took me until week 9 to finally create just a set of mapped pedons of total organic carbon (OC) to 1 meter depth. Just producing a useable national data-set became the quarter’s big project. I learned a great deal about SQL and Python coding but not much about spatial stats.
Okay, that’s my “GEO599 sob story”. I don’t want to get off too easy here since my project didn’t come through. So for the rest of my final blog post, I thought I’d present you all with something more exciting to read about: last May’s geostatistics conference in Newport.:
The following are some notes/observations of mine from the conference. Oh, one more thing. Just for kicks I decided to play catch up are run Hot Spot Analysis on my (finally) created OC carbon data. We already covered most everything about that tool, but I still wanted to do SOMETHING geospatial this term.
One final thing I don’t want to get buried in my report (I’ll repeat again later). The Central Coast GIS Users Group (CCGISUG) is free to join, so if you’re looking to build your professional network, stop reading right now and head to the following link and sign up!
http://www.orurisa.org/Default.aspx?pageId=459070
Central Oregon Coast Geospatial Statistics Symposium Report
On May 30th the Central Coast GIS Users Group (CCGISUG) held a conference on geospatial statistics. As a geographer currently pursing a minor in statistics I can say without embellishment this was the most interesting conference I’d ever attended. Kriging, R code, maps everywhere! I was rapt and attentive for every talk that day, and if we’re all being honest, when has that ever happened fot you over a full conference day?
I arrived a few minutes late and missed the opening overview speech, but walked in just in time to grab a coffee and settle down for EPA wetland researcher Melanie Frazier’s talk on using R for spatial interpolation. Setting a pro-R tone that would persist through the day, she praised R’s surge of increased geospatial capability and wowed us by kriging a map of wetland sand distribution in three lines of code. Some recommended R packages for you all to look into:
- sp – base package for vector data work in R
- rgdal – facilitates reading and writing of spatial objects (like shapefiles) in R
- raster – as the name says, tools for raster work
- gstat – interpolation modeling package
- plotKML – export R layers to kml for display in Google Earth
- colorspace – access to a world of color options
- mgcv – regression modeling package
Did you know its ridiculously easy to create variograms in R? Ms. Frazier assured us three lines of code could do it. Here’s an online example I found and she’s pretty spot on about the “three line” rule. Most any single task can be done in three lines of R code. R also supports cross-validation and other goodies. With the variogram, one can run regression kriging code in (gstat? mgcv? sorry I don’t know how she did it) and boom, raster interpolation maps done easy.
One other note from that talk. Ms. Frazier cannot yet release her code to the public, but if you want to hunt for R code examples. try Github.
Next, a USDA modeller, Lee McCoy, showed us all a spatial modelling project looking at eelgrass distribution in wetlands. Not much to report from this one, but I did observe that the USDA was not as concerned about model drivers or mechanistic explanation of their modeled maps. The goal was accurate maps. Period. And if that’s your research goal, great, but I felt it called for a slightly different approach than a masters’ student, a person trying to understand why certain spatial patterns formed, might take. I can’t yet explain how approaches would differ, but for example in research modelling one often throws out one of a pair of highly correlated variables (ex. temperature and precipitation). Mr. McCoy had highly correlated variables in his model and it didn’t bother him in the least.
Next up was Robert Suryan from our very own Oregon State University doing his home department of Fisheries and Wildlife proud. The main research question Mr. Suryan dealt with was modelling and predicting seabird habitat along the coast of the Pacific Northwest. The crux of his study was attempting to develop better predictor layers in support of modelling. For seabirds, chl0rophyll is an important indicator variable, as it’s a measurement of algal population. Those algae feed creatures who are on the lower trophic levels of the food web that ultimately support seabird populations. Remote sensing can pick up chlorophyll presence and typically researchers use the base mean chlorophyll raster as a predictor layer. But … can we do better?
Persistence is an emerging hot topic in spatial modelling. Average chlorophyll is one thing, but food webs only truly get a chance to build on algal food bases if the algea persist long enough to give time for complex food webs to develop. Using some fancy math techniques that I barely understood at the time, and certainly can’t do justice to in explanation to you all so I won’t even try, Mr. Suryan was able to construct a dataset layer of algal persistence from a time series dataset of mean chlorophyll. The resultant dataset exhibited a different spatial pattern than mean chlorophyll, so it was spatially distinct (though related) from its parent data. And, lo and behold, models with the persistence data layer outperformed the other models! Takeaway message: get creative about your base layers and considered whether they canbe manipulated into new data for representative of the processes you are trying to capture in your model.
Did you know the University of Oregon has an infographics lab? Megen Brittell, a Univeristy of Oregon graduate student, works in the lab and demonstrated how R can be used to make custom infographics.There’s a moderate degree of coding knowledge and time investment one needs for this work, but the payoff can be immense in terms of ability to convey information to your audience. This would also be a good time to mention OSU’s own Cartography and Geovisualization Group and the upcoming fall course: Algorithms for Geovisualization.
Whew. It was time for lunch at the conference. Big thanks to CCGISUG for giving away extra sandwiches to poor, starving grad attendees like myself. Free food in hand, it was time to go network. No, I didn’t find my new dream job, but it was fascinating to hear more about the life of working geographers. Main complaint these folk had about their profession: not much time spent in the field. On the other hand, always interesting new problems to tackle. If you’d like to meet fellow GIS users, why not sign yourself up for the free to join Central Coast GIS Users Group (CCGISUG)?
Pat Clinton of the EPA followed up lunch as an emergency fill-in for a no-show presenter. He gave a talk about modelling eelgrass distribution in wetlands that mostly served to reiterate themes presented in Ms. Frazier and Mr. McCoy’s talks. Once again a model’s predictive capability was valued most while placing lesser emphasis on mechanistic processes. I should take a moment to mention that these researchers don’t disregard the “why?” of which parameters work best. It’s more that once you get a map you feel confident with, especially after consultation with the experts, you’re finished. Don’t worry so much about perfect AIC or which variables did exactly what. After all, all those predictor layers are in there for a reason – they have something to do with the phenomenon you’re modelling, right? Not much else to report here except that I learned that a thalweg is term for the centerline of a stream. It can be digitized and one variable Mr. Clinton derived for his model was distance from the thalweg – another great example of getting creative in generating your own predictor layers through processing of existing data sources (in this case, the thalweg line itself).
Rebecca Flitcroft was next up with a talk about stream models. This had potential to be a specialized talk of little general interest, but instead Ms. Flitcroft wowed us all with a broad-based discussion of the very basics of what it means to perform spatial data analysis and how stream, which are networks, are quite different systems in many ways than the open, Cartesian grid area that we’re all most familiar with. This is a tough one to discuss without visuals, but here’s one example of how streams have need to be treated uniquely: with a habitat patch, an animal can enter or exit from any direction while in a stream a fish can only go upriver or downriver, and those directions are quite different in that one is against the flow and one is with the flow. Traditional geostats like nearest neighbor analysis are stuck in Euclidian distacne and won’t work on streams, where the distance is all about how far down the network (which can wind any which way) one must travel. There was much discussion of the development of geostats for streams and we learned that there is currently a dramatic lack of quality software to handle stream analysis (the FLOWS add-on to ArcMap is still the standard, but it only works in ArcMap 9 and wasn’t ported to ArcMap 10).
Ms. Flitcroft also directly mentioned something that was an underlying theme of the whole conference: “The ecology should drive analysis, not the available statistics” I found this quote incredibly relevant to our work in GEO599 exploring the potential of the ArcMap Spatial Statistic tools.
Betsy Breyer joined Ms. Brittell as the sole student presenters on the day. Hailing from Portland State University, Ms. Breyer gave a talk that could have fit right in with GEO599 – she discussed geographically weighted regression (GWR) from the ArcMap Spatial Statistics toolbox! She took the tool to task, with a main message that the tool is good for data exploration but bad for any type of test of certainty. GWR uses a moving window to create regressions that weigh closer points more heavily than far away points. The theory is that this will better tease out local variability in data. Some weaknesses of this method are that it can’t distinguish stationary and is susceptible to local multicollinearity problems (which isn’t a problem in global models). It is recommended to play with the “distance band”, essentially the size of that moving window, for best results. Takeaway – large datasets (high density?) are required for GWR to really be able to play to its strength, which is picking up local differences in your data set. It’s especially worth exploring this tool if your data is highly non-stationary.
OSU Geography closed the conference out with a presentation by faculty member Jim Graham. If you haven’t yet, it’s worth knocking on Jim’s very open office door to hear his well-informed thoughts on habitat modeling, coming from a coding / data manipulation background. Do it quick though, because Jim’s leaving to accept a tenure position at Humboldt State University this summer. For the geospatial conference, Mr. Graham discussed the ever-present problem of uncertainty in spatial analysis. He gave his always popular “Why does GBIF species data think there’s a polar bear in the Indian Ocean” example as an opener for why spatial modeling must be ever watchful for error. Moreover, Mr. Graham discussed emergent work looking at how to quantify levels of uncertainty in maps. One option is running Monte Carlo simulations of one’s model and seeing how much individual pixels deviate in the different model results. Another way to go is “jiggling” your points around spatially. It’s possible there’s spatial uncertainty in the point to begin with, and if Tobler’s Law holds (things near each other are more alike than things further away) then your robust model shouldn’t deviate much, right? Well, in a first attempt at this method, Dr. Graham found that edges of his model of red snapper habitat in the Gulf of Mexico displayed a large amount of uncertainty on the habitat borders. This prompts some interesting questions of what habitat boundaries really mean biologically. Food for thought.
With that the conference came to a close. We stacked chairs, cleaned up, and heading to Rouge Brewery for delicious craft beer and many jokes about optimizing table spacing for the seating of 20+ thirst geographers. Some final thoughts:
1) Go learn R.
2) No, seriously, learn R. It’s is emerging as a powerful spatial statistics package at a time when ArcMap is focusing more on cloud computing and other non-data analysis features (the Spatial Statistics Toolbox notwithstanding)
3) Spatial models should ultimately tell a story. Use the most pertinent data layers you find, and get creative in processing existing layers to make new, even better indicators of your variable of interest. Then get a model that looks right, even if it isn’t scoring the absolute best on AIC or other model measures.
4) There’s a rather broad and large community of spatial statisticians out there. It’s a field you might not hear as much about in daily life, but there’s over a hundred geographers just in Central Oregon. If you’re looking for some work support once you leave GEO599, a.k.a. Researcher’s Anonymous, well there’s plenty of folk who are also cursing at rasters and vectors out there.
– Max Taylor, OSU master’s student of pedometrics
BONUS MAP: Many weeks late, here’s my map of hot spots for my project data – organic carbon, underlain by a model of organic carbon using SSURGO data
Hi Max,
Thanks SO much for putting up a summary from the CCGISUG meeting! I am very intrigued and excited by your report on Robert Suryan’s presentation. I have read a number of papers trying to understand the concept of persistence and, in fact, may by looking at it myself in a project I am working on this summer. I may even try to contact him and follow up on this idea. And, your post has continued to confirm a general theme I am seeing in science….the use of R for doing statistics. I guess I know what one of my many projects this summer will involve! Thanks for providing a list of recommended R packages. Was there any mention of useful learning packages or books that would help the beginner learn R?
Thanks,
Dori
Hi Dori, there was one book mentioned at the conference. I don’t have my notes on me right now but will try to send you some intro R links on Monday (shoot me an e-mail if I forget). Hate to say it, but right now there is no one “go-to” tutorial website out there for learning R. It’s a huge need right now.
One idea I had was actually to maybe push the GEO department for a 599 course very similar to the one we just completed, except exploring the R package sp instead of the spatial statistics toolbox.
Hey Max,
Thanks for the great summary of the symposium. I found it very informative as well. Like you, I was intrigued by Rob Suryan’s talk and plan to look into his methods more and see if they might be applicable to my project.
Also, thanks for the links. I agree, R seems to be the go-to tool and the plethora of packages is incredible.