Radiotelemetry is a common tool for studying animal movement, consisting of attaching a radio tag to an individual and tracking it either remotely with a GPS system or at shorts distances using hand-held antennas.
Until recently, hummingbird movement patterns hadn’t been able to be studied due to lack of transmitters small enough to be carried by such a light animal. The development of miniaturized radio-telemetry devices changed this. Using them, Hadley & Betts were able to conduct a translocation experiment in 2008, while I, in 2012, gathered information on natural (within the home range) movement patterns. This information consisted in time-stamped location points, which seemed perfect for the application of spatio-temporal statistics. In particular, I wanted to see if certain characteristics of the observed movement paths (speed, turning angle) could be used to assess behavioral changes associated to characteristics of the terrain (presence of forest).
As my previous posts show, I was unsuccessful in doing this, as none of the tools or analyses I tried showed evidence of pattern, neither in space or time. It is possible that due to the very high flight speed of these birds (average= 30 mi [48 km]/ hour), we weren’t able to keep up with them while tracking them, leading to un-realistic speed estimates (1.2km/hour).
Even when my data doesn’t have enough precision to answer point-level questions the information on general movement rates of the individuals can still give an insight on how behavior can be affected by the context in which the animals are moving. The advantage of the lack of temporal and spatial correlation between the data points is that I will be able to use traditional statistics to run these analyses.
I also explored a new tool for analyzing telemetry data points: the Dynamic Brownian Bridge Model (dBBMM). This model predicts the areas probably used by the individuals based on their overall movement paths, taking into account not only the location of points but also the sequence in which they took place, incorporating temporal autocorrelation in their calculations. The dBBMM also estimates a parameter (“Brownian motion variance”) that can be used to evaluate the existence heterogeneous behavior along the tracks. High values of Brownian motion variance would indicate more complex paths (and consequent more active behavior), while low values would indicate less variation in the way the individual is moving. I tried to apply this model to my data, but wasn’t able to find a way of estimating the Brownian motion variance. What I was able to do though was generating rasters showing the probability distribution of individuals in space, that’s to say, the areas where it as likely for the birds to be present.

exdBBMM

Figure 1. Map showing the probability of observing a particular individual in space, based on data points obtained through radio-telemetry. This particular bird seems to have two centers of activity, joined by a transit area

The code to run this model (in R) can be found at http://www.computational-ecology.com/main-move.html.

I came into this quarter having not yet experienced Dr. Jones’ spatial statistics course (plan to enroll next fall!) nor having any data yet collected for my thesis. Still, it sounded like a great class concept and I thought I’d use the opportunity to explore an interesting set of data, a legacy database of soil pedon measurements from the NCSS:

The data wasn’t in a form I could run in ArcMap; it was stuck in an Access database rather than being a raster or shapefile. But, no worries, right? I figured by about week 3 I could extract something from the dataset worth playing around with (I was also working on this data set for GEO599, Big Data and Spatial Modeling). Well, you know how things take three times as long as you think they will?  It took me until week 9 to finally create just a set of mapped pedons of total organic carbon (OC) to 1 meter depth. Just producing a useable national data-set became the quarter’s big project. I learned a great deal about SQL and Python coding but not much about spatial stats.

Okay, that’s my “GEO599 sob story”. I don’t want to get off too easy here since my project didn’t come through. So for the rest of my final blog post, I thought I’d present you all with something more exciting to read about: last May’s geostatistics conference in Newport.:

The following are some notes/observations of mine from the conference. Oh, one more thing. Just for kicks I decided to play catch up are run Hot Spot Analysis on my (finally) created OC carbon data. We already covered most everything about that tool, but I still wanted to do SOMETHING geospatial this term.

One final thing I don’t want to get buried in my report (I’ll repeat again later). The Central Coast GIS Users Group (CCGISUG) is free to join, so if you’re looking to build your professional network, stop reading right now and head to the following link and sign up!

http://www.orurisa.org/Default.aspx?pageId=459070

Central Oregon Coast Geospatial Statistics Symposium Report

On May 30th the Central Coast GIS Users Group (CCGISUG) held a conference on geospatial statistics. As a geographer currently pursing a minor in statistics I can say without embellishment this was the most interesting conference I’d ever attended. Kriging, R code, maps everywhere! I was rapt and attentive for every talk that day, and if we’re all being honest, when has that ever happened fot you over a full conference day?

I arrived a few minutes late and missed the opening overview speech, but walked in just in time to grab a coffee and settle down for EPA wetland researcher Melanie Frazier’s talk on using R for spatial interpolation. Setting a pro-R tone that would persist through the day, she praised R’s surge of increased geospatial capability and wowed us by kriging a map of wetland sand distribution in three lines of code. Some recommended R packages for you all to look into:

  1. sp – base package for vector data work in R
  2. rgdal – facilitates reading and writing of spatial objects (like shapefiles) in R
  3. raster – as the name says, tools for raster work
  4. gstat – interpolation modeling package
  5. plotKML – export R layers to kml for display in Google Earth
  6. colorspace – access to a world of color options
  7. mgcv – regression modeling package

Did you know its ridiculously easy to create variograms in R? Ms. Frazier assured us three lines of code could do it. Here’s an online example I found and she’s pretty spot on about the “three line” rule. Most any single task can be done in three lines of R code. R also supports cross-validation and other goodies. With the variogram, one can run regression kriging code in (gstat? mgcv? sorry I don’t know how she did it) and boom, raster interpolation maps done easy.

One other note from that talk. Ms. Frazier cannot yet release her code to the public, but if you want to hunt for R code examples. try Github.

Next, a USDA modeller, Lee McCoy, showed us all a spatial modelling project looking at eelgrass distribution in wetlands. Not much to report from this one, but I did observe that the USDA was not as concerned about model drivers or mechanistic explanation of their modeled maps. The goal was accurate maps. Period. And if that’s your research goal, great, but I felt it called for a slightly different approach than a masters’ student, a person trying to understand why certain spatial patterns formed, might take. I can’t yet explain how approaches would differ, but for example in research modelling one often throws out one of a pair of highly correlated variables (ex. temperature and precipitation). Mr. McCoy had highly correlated variables in his model and it didn’t bother him in the least.

Next up was Robert Suryan from our very own Oregon State University doing his home department of Fisheries and Wildlife proud. The main research question Mr. Suryan dealt with was modelling and predicting seabird habitat along the coast of the Pacific Northwest. The crux of his study was attempting to develop better predictor layers in support of modelling. For seabirds, chl0rophyll is an important indicator variable, as it’s a measurement of algal population. Those algae feed creatures who are on the lower trophic levels of the food web that ultimately support seabird populations. Remote sensing can pick up chlorophyll presence and typically researchers use the base mean chlorophyll raster as a predictor layer. But … can we do better?

Persistence is an emerging hot topic in spatial modelling. Average chlorophyll is one thing, but food webs only truly get a chance to build on algal food bases if the algea persist long enough to give time for complex food webs to develop. Using some fancy math techniques that I barely understood at the time, and certainly can’t do justice to in explanation to you all so I won’t even try, Mr. Suryan was able to construct a dataset layer of algal persistence from a time series dataset of mean chlorophyll. The resultant dataset exhibited a different spatial pattern than mean chlorophyll, so it was spatially distinct (though related) from its parent data. And, lo and behold, models with the persistence data layer outperformed the other models! Takeaway message: get creative about your base layers and considered whether they canbe manipulated into new data for representative of the processes you are trying to capture in your model.

Did you know the University of Oregon has an infographics lab? Megen Brittell, a Univeristy of Oregon graduate student, works in the lab and demonstrated how R can be used to make custom infographics.There’s a moderate degree of coding knowledge and time investment one needs for this work, but the payoff can be immense in terms of ability to convey information to your audience. This would also be a good time to mention OSU’s own Cartography and Geovisualization Group and the upcoming fall course: Algorithms for Geovisualization.

Whew. It was time for lunch at the conference. Big thanks to CCGISUG for giving away extra sandwiches to poor, starving grad attendees like myself. Free food in hand, it was time to go network. No, I didn’t find my new dream job, but it was fascinating to hear more about the life of working geographers. Main complaint these folk had about their profession: not much time spent in the field. On the other hand, always interesting new problems to tackle. If you’d like to meet fellow GIS users, why not sign yourself up for the free to join Central Coast GIS Users Group (CCGISUG)?

Pat Clinton of the EPA followed up lunch as an emergency fill-in for a no-show presenter. He gave a talk about modelling eelgrass distribution in wetlands that mostly served to reiterate themes presented in Ms. Frazier and Mr. McCoy’s talks. Once again a model’s predictive capability was valued most while placing lesser emphasis on mechanistic processes. I should take a moment to mention that these researchers don’t disregard the “why?” of which parameters work best. It’s more that once you get a map you feel confident with, especially after consultation with the experts, you’re finished. Don’t worry so much about perfect AIC or which variables did exactly what. After all, all those predictor layers are in there for a reason – they have something to do with the phenomenon you’re modelling, right? Not much else to report here except that I learned that a thalweg is term for the centerline of a stream. It can be digitized and one variable Mr. Clinton derived for his model was distance from the thalweg – another great example of getting creative in generating your own predictor layers through processing of existing data sources (in this case, the thalweg line itself).

Rebecca Flitcroft was next up with a talk about stream models. This had potential to be a specialized talk of little general interest, but instead Ms. Flitcroft wowed us all with a broad-based discussion of the very basics of what it means to perform spatial data analysis and how stream, which are networks, are quite different systems in many ways than the open, Cartesian grid area that we’re all most familiar with. This is a tough one to discuss without visuals, but here’s one example of how streams have need to be treated uniquely: with a habitat patch, an animal can enter or exit from any direction while in a stream a fish can only go upriver or downriver, and those directions are quite different in that one is against the flow and one is with the flow. Traditional geostats like nearest neighbor analysis are stuck in Euclidian distacne and won’t work on streams,  where the distance is all about how far down the network (which can wind any which way) one must travel. There was much discussion of the development of geostats for streams and we learned that there is currently a dramatic lack of quality software to handle stream analysis (the FLOWS add-on to ArcMap is still the standard, but it only works in ArcMap 9 and wasn’t ported to ArcMap 10).

Ms. Flitcroft also directly mentioned something that was an underlying theme of the whole conference: “The ecology should drive analysis, not the available statistics” I found this quote incredibly relevant to our work in GEO599 exploring the potential of the ArcMap Spatial Statistic tools.

Betsy Breyer joined Ms. Brittell as the sole student presenters on the day. Hailing from Portland State University, Ms. Breyer gave a talk that could have fit right in with GEO599 – she discussed geographically weighted regression (GWR) from the ArcMap Spatial Statistics toolbox! She took the tool to task, with a main message that the tool is good for data exploration but bad for any type of test of certainty. GWR uses a moving window to create regressions that weigh closer points more heavily than far away points. The theory is that this will better tease out local variability in data. Some weaknesses of this method are that it can’t distinguish stationary and is susceptible to local multicollinearity problems (which isn’t a problem in global models). It is recommended to play with the “distance band”, essentially the size of that moving window, for best results. Takeaway – large datasets (high density?) are required for GWR to really be able to play to its strength, which is picking up local differences in your data set. It’s especially worth exploring this tool if your data is highly non-stationary.

OSU Geography closed the conference out with a presentation by faculty member Jim Graham. If you haven’t yet, it’s worth knocking on Jim’s very open office door to hear his well-informed thoughts on habitat modeling, coming from a coding / data manipulation background. Do it quick though, because Jim’s leaving to accept a tenure position at Humboldt State University this summer. For the geospatial conference, Mr. Graham discussed the ever-present problem of uncertainty in spatial analysis. He gave his always popular “Why does GBIF species data think there’s a polar bear in the Indian Ocean” example as an opener for why spatial modeling must be ever watchful for error. Moreover, Mr. Graham discussed emergent work looking at how to quantify levels of uncertainty in maps. One option is running Monte Carlo simulations of one’s model and seeing how much individual pixels deviate in the different model results. Another way to go is “jiggling” your points around spatially. It’s possible there’s spatial uncertainty in the point to begin with, and if Tobler’s Law holds (things near each other are more alike than things further away) then your robust model shouldn’t deviate much, right? Well, in a first attempt at this method, Dr. Graham found that edges of his model of red snapper habitat in the Gulf of Mexico displayed a large amount of uncertainty on the habitat borders. This prompts some interesting questions of what habitat boundaries really mean biologically. Food for thought.

With that the conference came to a close. We stacked chairs, cleaned up, and heading to Rouge Brewery for delicious craft beer and many jokes about optimizing table spacing for the seating of 20+ thirst geographers. Some final thoughts:

1) Go learn R.

2) No, seriously, learn R. It’s is emerging as a powerful spatial statistics package at a time when ArcMap is focusing more on cloud computing and other non-data analysis features (the Spatial Statistics Toolbox notwithstanding)

3) Spatial models should ultimately tell a story. Use the most pertinent data layers you find, and get creative in processing existing layers to make new, even better indicators of your variable of interest. Then get a model that looks right, even if it isn’t scoring the absolute best on AIC or other model measures.

4) There’s a rather broad and large community of spatial statisticians out there. It’s a field you might not hear as much about in daily life, but there’s over a hundred geographers just in Central Oregon. If you’re looking for some work support once you leave GEO599, a.k.a. Researcher’s Anonymous, well there’s plenty of folk who are also cursing at rasters and vectors out there.

– Max Taylor, OSU master’s student of pedometrics

 

BONUS MAP: Many weeks late, here’s my map of hot spots for my project data – organic carbon, underlain by a model of organic carbon using SSURGO data

carbon_map

Over the last couple of weeks I have been working to better define my research focus:

A geographical approach to understand how the local spatial structure of urban green space shapes the way in which communities evolve.  I hope to inform the Environmental Justice, Resilience Theory, and Adaptation literature as well. ( I anticipate adding to this and/or changing it entirely).

Below is a diagram of the Land Use and Society Model, which represents the dynamic feedback process where by a particular land use activity in the human/cultural circle may be modified by a new set of resource management  signals issued from the legal/ political circle in response to new awareness of the impacts of existing practices on the physical world.  I will  use a version of the Land Use and Society Model to help sort out my thoughts and ideas about my research.  For example, the process of urbanization, the removal of native vegetation and implementation of  impervious surfaces has created environmental impacts on the micro climate within urban areas (ie: heat island effect) let’s say that to mitigate this impact the state and local sectors enforce the implementation or modification of recreation areas/parks.  It is the enforcement of certain resource management regulations and how they effect the social and economic components of this model that interest me most.

Landuse_society_model

Below is an adapted model that I created which will focus on the cultural, social, and economic interactions as they relate to urban green space.

weems_landuse_society_model

I want to detect spatial  changes in social/economic composition and environmental benefits of communities over time. I will then quantify the change in urban green space spatial distribution and relate this back to access, in order to understand who has access and how that access has changed spatially and temporally.

I anticipate a number of scenarios/hypotheses to arise:

1.  If ∆ in urban green space access > 0, then ∆ in social/economic composition, and environmental benefit  > 0

Hypothesis_1

If there is a positive change in urban greenspace, then there will be a positive change in the social/economic composition and environmental benefit of the community as well.

2.  If ∆ in urban green space access < 0, then ∆ in social/economic composition, and environmental benefit  < 0

Hypothesis_2

If  there is no change in urban green space , then there will be no change in the social/economic composition and environmental benefit of the community.

3. Alternative Hypothesis – If ∆ in urban green space access > 0, then ∆ in social/economic composition, and environmental benefit < 0

Alternative_Hypothesis

If there is a positive change in urban green space access, then there will be a negative change in the social/economic composition and environmental benefit of the community.

Limitations:

– How will green space be formally defined?

I anticipate using a number of classifications for green space (park type, canopy coverage, greenness – NDVI ) thus I wonder How will this be further quantified?  Can I use an index?

– Measurement of Access

Proximity ≠ access

– Determining Migration

The data does not tell me where people go when they leave…

Can I detect the concept of “horizontal gentrification?”

In the North Pacific, humpback whales feed in various locations along the Pacific Rim including in the US, Canada, Russia and eastern Asia during summer. In winter, they migrate south to mate and calve along Pacific coasts as well as the offshore islands of Mexico, Hawaii, and Japan (including Ogasawara and Ryukyu Islands). Fidelity to feeding areas is high, and is thought to be maternally directed; mothers take their calves to their specific feeding ground, and these offspring subsequently return to this region each year after independence.

This maternally directed fidelity is reflected in studies of maternally inherited mitochondrial DNA (mtDNA). In an ocean-wide survey of genetic diversity and subsequent analysis of population structure in North Pacific humpback whales (Structure of Populations, Levels of Abundance, and Status of Humpbacks; SPLASH), sequencing of the mtDNA control region resolved 28 unique mtDNA haplotypes showing marked frequency differences among breeding grounds (overall FST=0.106, p<0.001, n=825) and among feeding regions (overall FST=0.179, p<0.001, n=1031; Baker et al. 2008).

Despite genetic evidence of regional population structure in the North Pacific (i.e. separation of humpback whales into various stocks), there have been few studies to investigate the possibility of finer-scale structure within a single North Pacific feeding ground. For example, it is unclear whether maternally directed site fidelity at smaller scales within southeastern Alaska results in discernible differences in haplotype and sex frequencies.

For my final investigation in this course, I decided to look at fine-scale population structure of humpback whales in southeastern Alaska by exploring spatial patterns in haplotype and sex distribution. Specifically, I wanted to answer the following questions:

  • Are haplotypes (A+, A-, E2) differentially distributed by latitude?
  • Are males and females differentially distributed by latitude?
  • Are certain maternal lineages more spatially clustered than others?
  • Are males or females more spatially clustered?
Methods and Results
First, I isolated haplotype and sex layers by using the “split layer by attribute” tool in XToolsPro. I then went into Excel and produced latitude bins throughout southeastern Alaska (54.1-54.5, 54.6-55, 55.1-55.5, 55.6-56, 56.1-56.5, 56.6-57, 57.1-57.5, 57.6-58, 58.1-58.5, 58.6-59, 59.1-59.5). Next, I totaled the number of encounters of each class variable in each bin and calculated the percent of each class variable in each bin.
Haplotype Distribution:
Haplotypes_Split3Untitled
Sex Distribution:
Sex_SplitUntitledgraph2

It appears as though there is a peak in percent of sex and haplotypes observations between 56.6-58.5 degrees. After looking closer at this, I realized that this peak is a function of my bin selection. After visualizing my population distribution within each bin, it is clear that most of my encounters occurred between 56.6-58.5 degrees. However, there are some patterns in differential class variable percents. For example, more A+ haplotypes are found near 58 degrees than A- and E2 haplotypes. Also, the E2 haplotype seems to be more represented at lower latitudes than A+ and A- haplotypes. Males and females seem to be fairly similar in their latitudinal distribution.

Nearest Neighbor Analysis:

Screen Shot 2013-06-09 at 1.05.25 PM
All haplotype classes are significantly clustered. The E2 haplotype has the highest z-score and is therefore the least clustered. The A- haplotype appears to be most clustered with the lowest z-score. Based on the z-score, males appear to be more spatially clustered than females, although both are significantly clustered. A nearest neighbor ratio of 1 indicates that the observed mean distance is equal to the expected mean distance based on a random distribution. Smaller nearest neighbor ratios indicate a larger deviation from 1 and, therefore, a more clustered class variable. It should be noted that the study area varies for each class variable. In my analysis, I was not able to standardize the study area to make these comparisons more meaningful. I am curious to know how these values vary across a standardized study area and with equal sample sizes.