Tag Archives: Ecology

Final project: Washing out the (black) stain and ringing out the details

BACKGROUND

In order to explain my project, especially my hypotheses, some background information about this disease is necessary. Black stain root disease of Douglas-fir is caused by the fungus Leptographium wageneri. It infects the roots of its Douglas-fir host, growing in the xylem and cutting the tree off from water. It spreads between adjacent trees via growth through root contacts and grafts and long-distance via insects (vectors) that feed and breed in roots and stumps and carry fungal spores to new hosts.

Forest management practices influence the spread of disease because of the influence on (i) the distance between trees (determined by natural or planted tree densities); (ii) adjacency of susceptible species (as in single-species Douglas-fir plantations); (iii) road, thinning and harvest disturbance, which create suitable habitat for insect vectors (stumps, dead trees) and stress remaining live trees, attracting insect vectors; and (iv) forest age distributions, because rotation lengths determine the age structure in managed forest landscapes and younger trees (<30-40 years old) are thought to be more susceptible to infection and mortality by the disease.

RESEARCH QUESTION

How do (B) spatial patterns of forest management practices relate to (A) spatial patterns of black stain root disease (BSRD) infection probabilities at the stand and landscape scale via (C) the spatial configuration and connectivity of susceptible stands to infection?

In order to address my research questions, I built a spatial model to simulate BSRD spread in forest landscapes using the agent-based modeling software NetLogo (Wilensky 1999). I used Exercises 1-3 to focus on the spatial patterns of forest management classes. Landscapes were equivalent in terms of the proportion of each management class and number of stands, varying only in spatial pattern of management classes. In the exercises, I evaluated the relationship between management and disease by simulating disease spread in landscapes with two distinct spatial patterns of management:

  • Clustered landscape: The landscape was evenly divided into three blocks, one for each management class. Each block was evenly divided into stands.
  • Random landscape: The landscape was evenly divided into stands, and forest management classes were randomly assigned to each stand.

MY DATA

I analyzed outputs of my spatial model. The raster files contain the states of cells in forest landscapes at a given time step during one model run. States tracked include management class, stand ID number, presence/absence of trees, tree age, probability of infection, and infection status (infected/not infected). Management class and stand ID did not change during the model run. I analyzed tree states from the last step of the model run and did not analyze change over time.

Extent: ~20 hectares (much smaller than my full models runs will be)

Spatial resolution: ~1.524 x 1.524 m cells (maximum 1 tree per cell)

Three contrasting and realistic forest management classes for the Pacific Northwest were present in the landscapes analyzed:

  • Intensive – Active management: 15-foot spacing, no thinning, harvest at 37 years.
  • Extensive – Active management: 10-foot spacing, one pre-commercial and two commercial thinnings, harvest at 80 years.
  • Set-aside/old-growth (OG) – No active management: Forest with Douglas-fir in Pacific Northwest old-growth densities and age distributions and uneven spacing with no thinning or harvest.

HYPOTHESES: PREDICTIONS OF PATTERNS AND PROCESSES I LOOKED FOR

Because forest management practices influence the spread of disease as described in the “Background” section above, I hypothesized that the spatial patterns of forest management practices would influence the spatial pattern of disease. Having changed my experimental design and learned about spatial statistics and analysis methods throughout the course, I hypothesize that…

  • The “clustered” landscape will have (i) higher absolute values of infection probabilities, (ii) higher spatial autocorrelation in infection probabilities, and (iii) larger infection centers (“hotspots” of infection probabilities) than the “random” landscape because clustering of similarly managed forest stands creates continuous, connected areas of forest managed in a manner that creates suitable vector and pathogen habitat and facilitates the spread of disease (higher planting densities, lower age, frequent thinning and harvest disturbance in the intensive and extensive management). I therefore predict that:
    • Intensive and extensive stands will have the highest infection probabilities with large infection centers (“hotspots”) that extend beyond stand boundaries.
      • Spatial autocorrelation will therefore be higher and exhibit a lower rate of decrease with increasing distance because there will be larger clusters of high and low infection probabilities when stands with similar management are clustered.
    • Set-aside (old-growth, OG) stands will have the lowest infection probabilities, with small infection centers that may or may not extend beyond stand boundaries.
      • Where old-growth stands are in contact with intensive or extensive stands, neighborhood effects (and edge effects) will increase infection probabilities in those OG stands.
    • In contrast, the “random” landscape will have (i) lower absolute values of infection probabilities, (ii) less spatial autocorrelation in infection probabilities, and (iii) smaller infection centers than the “clustered” landscape. This is because the random landscape will have less continuity and connectivity between similarly managed forest stands; stands with management that facilitates disease spread will be less connected and stands with management that does not facilitate the spread of disease will also be less connected. I would predict that:
      • Intensive and extensive stands will still have the highest infection probabilities, but that the spread of infection will be limited at the boundaries with low-susceptibility old-growth stands.
        • Because of the boundaries created by the spatial arrangement of low-susceptibility old-growth stands, clusters of similar infection probabilities will be smaller and values of spatial autocorrelation will be lower and decrease more rapidly with increasing lag distance. At the same time, old-growth stands may have higher infection probabilities in the random landscape than in the clustered landscape because they would be more likely to be in contact with high-susceptibility intensive and extensive stands.
      • I also hypothesize that each stand’s neighborhood and spatial position relative to stands of similar or different management will influence that stand’s infection probabilities because of the difference in spread rates between management classes and the level of connectivity to high- and low-susceptibility stands based on the spatial distribution of management classes.
        • Stands with a large proportion of high-susceptibility neighboring stands (e.g., extensive or intensive management) will have higher infection probabilities than similarly managed stands with a small proportion of high-susceptibility neighboring stands.
        • High infection probabilities will be concentrated in intensive and extensive stands that have high levels of connectivity within their management class networks because high connectivity will allow for the rapid spread of the disease to those stands. In other words, the more connected you are to high-susceptibility stands, the higher your probability of infection.

APPROACHES: ANALYSIS APPROACHES I USED

Ex. 1: Correlogram, Global Moran’s I statistic

In order to test whether similar infection probability values were spatially clustered, I used the raster package in R (Hijmans 2019) to calculate the global Moran’s I statistic at multiple lag distances for both of the landscape patterns. I then plotted global Moran’s I vs. distance to create a correlogram and compared my results between landscapes.

Ex. 2: Hotspot analyses (ArcMap), Neighborhood analyses (ArcMap)

First, I performed a non-spatial analysis comparing infection probabilities between (i) landscape patterns (ii) management classes, and (iii) management classes in each of the landscapes. Then, I used the Hotspot Analysis (Getis-Ord Gi*) tool in ArcMap to identify statistically significant hot- and cold-spots of high and low infection probabilities, respectively. I selected points within hot and cold spots and used the Multiple Ring Buffer tool in ArcMap to create distance rings, which I intersected with the management classes to perform a neighborhood analysis. This neighborhood analysis revealed how the proportion of each management class changed with increasing distance from hotspots in order to test whether the management “neighborhood” of trees influenced their probability of infection.

Ex. 3: Network and landscape connectivity analyses (Conefor)

I divided my landscape into three separate stand networks based on their management class. Then, I used the free landscape connectivity software Conefor (Saura and Torné 2009) to analyze the connectivity of each stand based on its position within and role in connecting the network using the Integrative Index of Connectivity (Saura and Rubio 2010). I then assessed the relationship between the connectivity of each stand and infection probabilities of trees within that stand using various summary statistics (e.g., mean, median) to test whether connectivity was related to infection probability.

RESULTS: WHAT DID I PRODUCE?

As my model had not been parameterized by the beginning of this term, I analyzed “dummy” data, where infection spread probabilities were calculated as a decreasing linear function of distance from infected trees. However, the results I produced still provided insights as to the general functioning of the model and factors that will likely influence my results in the full, parameterized model.

I produced both maps and numerical/statistical relationships that describe the patterns of “A” (infection probabilities), the relationship between “A” and “B” (forest management classes), and how/whether “A” and “B” are related via “C” (landscape connectivity and stand networks).

In Exercise 1, I found evidence to support my hypothesis of spatial autocorrelation at small scales in both landscapes and higher autocorrelation and slower decay with distance in the clustered landscape than the random landscape. This was expected because the design of the model calculated probability of infection for each tree as a function of distance from infected trees.

In Exercises 2 and 3, I found little to no evidence to support the hypothesis that either connectivity or neighboring stand management had significant influence on infection probabilities. Because the model that produced the “dummy” data limited infection to ~35 meters from infected trees and harvest and thinning attraction had not been integrated into infection calculations, this result was not surprising. In my full model where spread via insect vectors could span >1,000 m, I expect to see a larger influence of connectivity and neighborhood on infection probabilities.

A critical component of model testing is exploring the “parameter space”, including a range of possible values for each parameter. This is especially for agent-based models where there are complex interactions between many individuals that result in larger-scale patterns that may be emergent and not fully predictable by the simple sum of the parts. In my model, the disease parameters of interest are the factors influencing probability of infection (Fig. 1). In order to understand how the model reacts to changes in those parameters, I will perform a sensitivity analysis, systematically adjusting parameter values one-by-one and comparing the results of each series of model runs under each set of parameter values.

Fig.1. Two of the model parameters that will be systematically adjusted during sensitivity analysis. Tree susceptibility to infection as a function of age (left) and probability of root contact as a function of distance (right) will both likely influence model behavior and the relative levels of infection probability between the three management classes.

This is especially relevant given that in Exercises 1 through 3, I found that the extensively managed plantations had the highest values of infection probability and most of the infection hotspots, likely due to the fact that this management class has the highest [initial] density of trees. For the complete model, I am hypothesizing that the intensive plantations will have the highest infection probabilities because of high frequency of insect-attracting harvest and short rotations that maintain the trees in an age class highly susceptible to infection. In the full model, the extensive plantations will have higher initial density than the intensive plantations but will undergo multiple thinnings, decreasing tree density but attracting vectors, and will be harvested at age 80, thus allowing trees to grow into a less susceptible age class. In this final model, thinning, harvest length, and vector attraction will factor in to the calculation of infection probabilities. My analysis made it clear that even a 1.5 meter difference in spacing resulted in a statistically significant difference for disease transmission, with much higher disease spread in the denser forest. Because the model is highly sensitive to tree spacing, likely because the parameters of my model that relate to distance drop off in sigmoidal or exponential decay patterns, I would hypothesize that changes in the values of parameters that influence the spatial spread of disease (i.e., insect dispersal distance, probability of root contact with distance) and the magnitude of vector attraction after harvest and thinning will determine whether the “extensive” or “intensive” forest management class will ultimately the highest levels of infection probabilities. In addition, the rate of decay of root contact and insect dispersal probabilities will determine whether management and infection within stands influence infection in neighboring stands and the distance and strength of those neighborhood effects. I would like to test this my performing such analyses on the outputs from my sensitivity analyses.

SIGNIFICANCE: WHAT DID I LEARN FROM MY RESULTS? HOW ARE THESE RESULTS IMPORTANT TO SCIENCE? TO RESOURCE MANAGERS?

Ultimately, the significance of this research is to understand the potential threat of black stain root disease in the Pacific Northwest and inform management practices by identifying evidence-based, landscape-scale management strategies that could mitigate BSRD disease issues. While the results of Exercises 1-3 were interesting, they were produced using a model that had not been fully parameterized and thus are not representative of the likely actual model outcomes. Therefore, I was not able to test my hypotheses. That said, this course allowed me to design and develop an analysis to test my hypotheses. The exercises I completed have also provided a deeper understanding of how my model works. Through this process, I have begun to generate additional testable hypotheses regarding model sensitivity to parameters and the relative spread rates of infection in each of the forest management classes. Another key takeaway is the importance of producing many runs with the same landscape configuration and parameter settings to account for stochastic processes in the model. By only analyzing one run for each scenario, there is a chance that the results are not representative of the average behavior of the system or the full range of behaviors possible for those scenarios. For example, with the random landscape configuration, one generated landscape can be highly connected and the next highly fragmented with respect to intensive plantations, and only a series of runs under the same conditions would provide reliable results for interpretation.

WHAT I LEARNED ABOUT… SOFTWARE

(a, b) Arc-Info, Modelbuilder and/or GIS programming in Python

This was my first opportunity to perform statistical analysis in ArcGIS, and I used multiple new tools, including hotspot analysis, multiple ring buffers, and using extensions. Though I did not use Python or Modelbuilder, I realized that doing so will be critical for automating my analyses given the large number of model runs I will be analyzing. While I learned how to program in Python using arcpy in GEOG 562, I used this course to choose the appropriate tools and analyses for my questions and hypotheses rather than automating procedures I may not use again. I would now like to implement my procedures for neighborhood analysis in Python in order to automate and increase the efficiency of my workflow.

(c) Spatial analysis in R

During this course, I learned most about spatial data manipulation in R, since I had limited experience using R with spatial data beforehand. I used R for spatial statistics, data cleaning and management, and conversion between vector and raster data. I also learned about the limitations of R (and my personal limitations) in terms of the challenge of learning how to use packages and their functions when documentation is variable in quality and a wide variety of user-generated packages are available with little reference as to their quality and reliability. For example, for Exercise 2, I had trouble finding an up-to-date and high-quality package for hotspot analysis in R, with raster data or otherwise. However, this method was straightforward in ArcMap once the data were converted from raster to points. For Exercise 1, the only Moran’s I calculation that I was able to perform with my raster data was the “moran” function in the raster package, which does not provide z- or p-values to evaluate the statistical significance of the calculated Moran’s I and requires you to generate your own weights matrices, which is a pain. Using the spdep or ncf packages for this analysis was incredibly slow (though I am not sure why), and the learning curve for spatstat was too steep for me to overcome by the Exercise 1 deadline (but I hope to return to this package in the future).

Reading, manipulating, and converting data: With some trial and error and research into the packages available for working with spatial data in R (especially raster, sp/spdep, and sf), I learned how to quickly and easily convert data between raster and shapefile formats, which was very useful in automating the cleaning and preparation for multiple datasets and creating the inputs for the analyses I want to perform.

(d) Landscape connectivity analyses: I learned that there are a wide variety of metrics available through Fragstats (and landscapemetrics and landscapetools packages in R), however, I was not able to perform my desired stand-scale analysis of connectivity because I could not determine whether it is possible to analyze contiguous stands with the same management class as separate patches (Fragstats considered all contiguous cells in the raster with the same class to be part of the same patch). Instead, I used Conefor, which has an ArcMap extension that allows you to generate a node and connection file from a polygon shapefile, to calculate relatively few but robust and ecologically meaningful connectivity metrics for the stands in my landscape.

WHAT I LEARNED ABOUT… SPATIAL STATISTICS

Correlograms and Moran’s I: For this statistical method, I learned the importance of choosing meaningful lag distances based on the data being analyzed and the process being examined. For example, my correlogram consists of a lot of “noise” with many peaks and troughs due to the empty cells between trees, but I also captured data at the relevant distances. Failure to choose appropriate lag distances means that some autocorrelation could be missed, but analyses of large raster images at a high resolution of lag distances results in very slow processing. In addition, I wanted to compare local vs. global Moran’s I to determine whether infections were sequestered to certain portions of the landscape or spread throughout the entire landscape, but the function for local Moran’s I returned values far outside the -1 to 1 range of the global Moran’s I. As a result, I did not understand how to interpret or compare these values. In addition, global Moran’s I did not tell me where spatial autocorrelation was happening, but the fact that there was spatial autocorrelation led me to perform a…

Hotspot analysis (Getis-Ord Gi*): It became clear that deep conceptual understanding of hypothesized spatial relationships and processes in the data and a clear hypothesis are critical for hotspot analysis. I performed multiple analyses with difference distance weighting to compare the results, and there was a large variation in both the number of points included in hot and cold spots and the landscape area covered by those spots between the different weighting and distance methods. I ended up choosing the inverse squared distance weighting based on my understanding of root transmission and vector dispersal probabilities and because this weighting method was the most conservative (produced the smallest hotspots). The confidence level chosen also resulted in large variation in the size of hotspots. After confirming that there was spatial autocorrelation in infection probabilities, using this method helped me to understand where in the landscape these patterns were occurring and thus how they related to management practices.

Neighborhood analysis: I did not find this method provided much insight in my case, not because of the method itself but because of my data (it just confirmed the landscape pattern that I had designed, clustered vs. random) and my approach (one hotspot and one coldspot point non-randomly selected in each landscape. I also found this method to be tedious in ArcMap, though I would like to automate it, and I later learned about the zonal statistics tool, which can help make this analysis more efficient. In general, it is not clear what statistics I could have used to confirm whether results were significantly different between landscapes, but perhaps this is an issue caused by my own ignorance.

Network/landscape connectivity analyses: I learned that there are a wide variety of tools, programs, and metrics available for these types of analyses. I found the Integrative Index of Connectivity (implemented in Conefor) particularly interesting because of the way it categorizes habitat patches based on multiple attributes in addition to their spatial and topological positions in the landscape. The documentation for this metric is thorough, its ecological significance has been supported in peer-reviewed publications (Saura and Rubio 2010), and it is relatively easy to interpret. In contrast, I found the number of metrics available in Fragstats to be overwhelming especially during the data exploration phase.

REFERENCES

Robert J. Hijmans (2019). raster: Geographic Data Analysis and Modeling. R package version 2.8-19. https://CRAN.R-project.org/package=raster

Saura, S. & J. Torné. 2009. Conefor Sensinode 2.2: a software package for quantifying the importance of habitat patches for landscape connectivity. Environmental Modelling & Software 24: 135-139.

Saura, S. & L. Rubio. 2010. A common currency for the different ways in which patches and links can contribute to habitat availability and connectivity in the landscape. Ecography 33: 523-537.

Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

Final Project: San Diego Bottlenose Dolphin Sighting Distributions

Final Project: San Diego Bottlenose Dolphin Sighting Distributions

The Research Question:

Originally, I asked the question: do common bottlenose dolphin sighting distances from shore change over time?

However, throughout the research and analysis process, I refined this question for a multitude of reasons. For example, I planned on using all of my dolphin sightings from my six different survey locations along the California coastline. Because the bulk of my sightings are from the San Diego survey site, I chose this data set for completeness and feasibility. Additionally, this data set used the most standard survey methods. Rather than simply looking at distance from shore, which would be at a very fine scale, seeing as all of my sightings are within two kilometers from shore, I chose to try and identify changes in latitude. Furthermore, I wanted to see if changes in latitude (if present, were somehow related to the El Nino Southern Oscillation (ENSO) cycles and then distances to lagoons). This data set also has the largest span of sightings by both year and month. When you see my hypotheses, you will notice that my original research question morphed into much more specific hypotheses.

Data Description:

My dolphin sighting data spans 1981-2015 with a few absent years, and sightings covering all months, but not in all years sampled. The same transects were performed in a small boat with approximately a two kilometer sighting span (one kilometer surveyed 90 degrees to starboard and port of the bow). These data points therefore have a resolution of approximately two kilometers. Much of the other data has a coarser spatial resolution, which is why it was important to use such a robust data set. The ENSO data I used gave a broad brushstroke approach to ENSO indices. Rather than first using the exact ENSO index which is at a fine scale, I used the NOAA database that split month-years into positive, neutral, and negative indices (1, 0, and -1, respectively). These data were at a month-year temporal resolution, which I matched to my month-date information of my sighting data. Lagoon data were sourced from the mid-late 2000s, therefore I treated lagoon distances as static.

Hypotheses:

H1: I predicted that bottlenose dolphin sightings at the pod-scale (usually, one to ten individuals) along the San Diego transect throughout the years 1981-2015 would exhibit clustered distribution patterns as a result of the patchy distributions of both the species’ preferred habitats and prey, as well as the social nature of this species.

H2: I predicted there would be higher densities of bottlenose dolphin sightings at the pod-scale (usually, one to ten individuals) at higher latitudes spanning 1981-2015 due to prey distributions shifting northward and less human activities in the northward sections of the transect. I predicted that during warm (positive) ENSO months, the dolphin sightings in San Diego would be distributed more northerly, predominantly with prey aggregations historically shifting northward into cooler waters, due to (secondarily) increasing sea surface temperatures. I expect the spatial gradient to shift north and south, in relation to the ENSO gradient (warm, neutral, or cold)

H3: I predicted that along the San Diego coastline, bottlenose dolphin sightings at the pod-scale (usually, one to ten individuals) would be clustered around the six major lagoons within about two kilometers, with no specific preference for any lagoon, because the murky, nutrient-rich waters in the estuarine environments are ideal for prey protection and known for their higher densities of schooling fishes.

Map with bottlenose dolphin sightings on the one-kilometer buffered transect line and the six major lagoons in San Diego.

Approaches:

I utilized multiple approaches with different software platforms including ArcMap, qGIS, GoogleEarth, and R Studio (with some Excel data cleaning).

  • Buffers in ArcMap
  • Calculations in an attribute table
  • ANOVA with Tukey HSD
  • Nearest Neighbor averages
  • Cluster analyses
  • Histograms and Bar plots

Results: 

I produced a few maps (will be), found statistical relationships between sightings and distribution patterns,  ENSO and dolphin latitudes, and distances to lagoons.

H1: I predicted that bottlenose dolphin sightings at the pod-scale (usually, one to ten individuals) along the San Diego transect throughout the years 1981-2015 would exhibit clustered distribution patterns as a result of the patchy distributions of both the species’ preferred habitats and prey, as well as the social nature of this species.

True: The results of the average nearest neighbor spatial analysis in ArcMap 10.6 produced a z-score of -127.16 with a p-value of < 0.000001, which translates into there being less than a 1% likelihood that this clustered pattern could be the result of random chance. Although I could not look directly at prey distributions because of data availability, it is well-known that schooling fishes exist in clustered distributions that could be related to these dolphin sightings also being clustered. In addition, bottlenose dolphins are highly social and although pods change in composition of individuals, the dolphins do usually transit, feed, and socialize in small groups. Also see Exercise 2 for other, relevant preliminary results, including a histogram of the distribution in differences of sighting latitudes.

Summary from the Average Nearest Neighbor calculation in ArcMap 10.6 displaying that bottlenose dolphin sightings in San Diego are highly clustered.

H2: I predicted there would be higher densities of bottlenose dolphin sightings at the pod-scale (usually, one to ten individuals) at higher latitudes spanning 1981-2015 due to prey distributions shifting northward and less human activities in the northward sections of the transect. With this, I predicted that during warm (positive) ENSO months, the dolphin sightings in San Diego would be distributed more northerly, predominantly with prey aggregations historically shifting northward into cooler waters, due to (secondarily) increasing sea surface temperatures. I expect the spatial gradient to shift north and south, in relation to the ENSO gradient (warm, neutral, or cold).

False: the sightings are more clumped towards the lower latitudes overall (p < 2e-16), possibly due to habitat preference. The sightings are closer to beaches with higher human densities and human-related activities near Mission Bay, CA. It should be noted, that just north of the San Diego transect is the Camp Pendleton Marine Base which conducts frequent military exercises and could deter animals.

I used an ANOVA analysis and found there was a significant difference in sighting latitude distributions between monthly ENSO indices. A Tukey HSD was performed to determine where the differences between treatment(s) were significant. All differences (neutral and negative, positive and negative, and positive and neutral ENSO indices) were significant with p < 0.005.

H3: I predicted that along the San Diego coastline, bottlenose dolphin sightings at the pod-scale (usually, one to ten individuals) would be clustered around the six major lagoons within about two kilometers, with no specific preference for any lagoon, because the murky, nutrient-rich waters in the estuarine environments are ideal for prey protection and known for their higher densities of schooling fishes. See my Exercise 3 results.

Using a histogram, I was able to visualize how distances to each lagoon differed by lagoon. That is dolphin sightings nearest to, Lagoon 6, the San Dieguito Lagoon, are always within 0.03 decimal degrees. In comparison, Lagoon 5, Los Penasquitos Lagoon, is distributed across distances, with the most sightings at a great distance.

Bar plot displaying the different distances from dolphin sighting location to the nearest lagoon in San Diego in decimal degrees. Note: Lagoon 4 is south of the study site and therefore was never the nearest lagoon.

After running an ANOVA in R Studio, I found that there was a significant difference between distance to nearest lagoon in different ENSO index categories (p < 2.55e-9) with a Tukey HSD confirming that the significant difference in distance to nearest lagoon being significant between neutral and negative values and positive and neutral years. Therefore, I gather there must be something happening in neutral months that changes the distance to the nearest lagoon, potentially prey are more static or more dynamic in those years compared to the positive and negative months. Using a violin plot, it appears that Lagoon 5, Los Penasquitos Lagoon, has the widest span of sighting distances when it is the nearest lagoon in all ENSO index month values. In neutral years, Lagoon 0, the Buena Vista Lagoon has more than a single sighting (there were none in negative months and only one in positive months). The Buena Vista Lagoon is the most northerly lagoon, which may indicate that in neutral ENSO months, dolphin pods are more northerly in their distribution.

Takeaways to science and management: 

Bottlenose dolphins have a clustered distribution which seems to be related to ENSO monthly indices, with certain years having more of a difference in distribution, and likely, their sociality on a larger scale. Neutral ENSO months seem to have a different characteristic that impact sighting distribution locations along the San Diego coastline. More research needs to be done in this to determine what is different about neutral months and how this may impact this dolphin population. On a finer scale, the six lagoons in San Diego appear to have a spatial relationship with dolphin pod sighting distributions. These lagoons may provide critical habitat for bottlenose dolphin preferred prey species or preferred habitat for the dolphins themselves either for cover or for hunting, and different lagoons may have different spans of impact at different distances, either by creating larger nutrient plumes, or because of static, geographic and geologic features. This could mean that specific areas should be protected more or maintain protection. For example, the Batiquitos and San Dieguito Lagoons have some Marine Conservation Areas with No-Take Zones. It is interesting to see the relationship to different lagoons, which may provide nutrient outflows and protection for key bottlenose dolphin prey species. The city of San Diego and the state of California are need ways to assess the coastlines and how protecting the marine, estuarine, and terrestrial environments near and encompassing the coastlines impact the greater ecosystem. Other than the Marine Mammal Protection Act and small protected zones, there are no safeguards for these dolphins.

My Learning: about software (a) Arc-Info and b) R

  1. a) Arc-Info: buffer creation, creating graphs, nearest neighbor analyses. How to deal with transects, certain data with mismatching information, conflicting shapefiles
  2. b) R: I didn’t know much, except the basics in R. I learned about how to conduct ANOVAs and then how to interpret results. Mainly I learned about how to visualize my results and use new packages.

My Learning: about statistics

Throughout this project I learned that spatial statistics requires clear hypothesis testing in order to clearly step through a spatial process. Most specifically, I learned about spatial analyses in ArcMap, and how I could utilize nearest neighbor calculations to assess distribution patters. Furthermore, I now have a better understanding of spatial distribution patterns and how they are assessed, such as clustering versus random versus equally dispersed distributions. For more data analysis and cleaning, I also learned how to apply my novice understanding of ANOVAs and then display results relating to spatial relationships (distances) using histograms and other graphical displays in R Studio.

________________________________________________________________________

Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki

Manipulating salinity to create a better fit habitat suitability model for Olympia oysters

Follow-up from Exercise 2
In Exercise 2, I compared Olympia oyster location data to the model of predicted suitable habitat that I developed in Exercise 1. Results from that analysis showed that 13 of the 18 observations within or near areas of high-quality habitat (type 4) indicated the presence of Olympia oysters (72%) versus 5 locations where oysters were not found (28%). No field survey locations fell within areas of majority lowest quality habitat (type 1). Seven observations were found within the second lowest quality habitat type (2), with 2 of those observations indicating presence (29%) and 5 indicating absence (71%).

Habitat suitability
4 3 2 1
Presence 13 [0.72] 4 [0.4] 2 [0.29] 0 [0]
Absence 5 [0.28] 6 [0.6] 5 [0.71] 0 [0]
Total (n = 35*) 18 [1.0] 10 [1.0] 7 [1.0] 0 [0]

*3 data points removed from analysis due to inconclusive search results.

To expand on this analysis, I used a confusion matrix to further examine the ‘errors’ in the data, or the observations that did not correlate with my model of predicted suitable habitat. For ease of interpretation, I removed habitat suitability type 1 since there were not observations in this category, and type 3 since it fell in between high and low-quality habitat.

Habitat suitability
4 (high) 2 (low)
Presence 0.37 0.06
Absence 0.14 0.14

Decimals reported indicate the proportion of total observations (n = 35) that fell within this category. The habitat suitability model predicted that oysters would be present within the highest quality habitat type and absent in low-quality habitat. The confusion matrix shows that the model was successful in predicting that 37% of the total observations where oysters were present were found within habitat type 4 (high), and 14% of the observations where oysters were absent were found in habitat type 2 (low).

In the type 4 habitat, 14% of the total observations found that oysters were absent, which goes against the model prediction. I suspect this is partly due to the patchy nature of substrate availability in Yaquina Bay and the low-resolution quality of the substrate raster layer used for analysis. For the 6% of observations that show oyster presence within habitat type 2, it’s possible that these points were juvenile oysters that were able to settle in year-1, but are less likely to survive into adulthood. Both of these errors could also indicate issues with the weights assigned in the model back in Exercise 1.

Question asked
For exercise 3, I wanted to expand on the habitat suitability analysis to see if I could more accurately predict oyster locations and account for the errors found in exercise 2. Here I asked:

Can the spatial pattern of Olympia oyster location data be more accurately described by manipulating the spatial pattern of one of the parameters of suitable habitat (salinity)?

I decided to modify the rank values of one of the model parameters: salinity. Based on my experience collecting oyster location data in the field, it seemed that salinity was the biggest influence in where oysters would be found. It was also the easiest parameter to change since it had the fewest rank categories. The excerpt below comes from the ranking value table I established for the habitat parameters in Exercise 1. Changes to rank value for salinity are indicated in the right-most column.

Habitat parameter Subcategories Subcategory variable range Olympia oyster tolerance Rank value applied
Mean wet-season salinity (psu) Upper estuary < 16 psu somewhat, but not long-term 1 –> 2
Upper mid estuary 16.1 – 23 psu yes 4 –> 3
Lower mid estuary 23.1 – 27 psu yes 3 –> 4
Lower estuary > 27 psu somewhat 2 –> 1

Name of tool or approach
I combined my approach from exercise 1 and exercise 2 to create a different model output based on the new rank values applied to the salinity parameter. The analysis was completed in ArcGIS Pro and the table of values generated was reviewed in Excel.

Brief description of steps to complete the analysis

  1. After assigning new rank values to the salinity parameter, I applied a new ‘weighted overlay’ to the salinity raster layer in ArcGIS. As I did in exercise 1, I used the ‘weighted overlay’ tool again to combine the weighted substrate and bathymetry layers with the updated salinity layer. A new map of suitable habitat was created based on these new ranking values.
  2. Then, I added the field observation data of oyster presence/absence to the map and created a new map of all the data points overlaid on habitat suitability.
  3. I then created buffers around each of the points using the ‘buffer’ tool. In the last analysis, I used the ‘multiple ring buffer’, but was only able to analyze the largest buffer (300m). This time, I created only the one buffer around each point.
  4. Using the ‘Zonal Statistics’ tool, I overlaid the newly created buffers on the updated raster of habitat suitability and viewed the results. I again chose ‘majority’ as my visual represented statistic, which categories the buffer based on the habitat suitability type occupying the largest area.
  5. I also created a results table using the ‘Zonal Statistics as Table’ tool, then copied it over to Excel for additional analysis.

Results
An updated table based on manipulated salinity rank values was generated to compare to the table created from exercise 2 and displayed at the top of this blog post. Results from this analysis showed that only 2 of the 35 total observations fell within or near areas of high-quality habitat (type 4), one indicated presence and the other absence. The adjustments to the salinity rank value allowed the habitat type 3 to dominate the map, with 31 of the total 35 observations falling in this category. Of the 31 points, 18 showed presence data (58%) and 13 were absence data (42%). Again, no field survey locations fell within areas of majority lowest quality habitat (type 1). Two observations were found within the second lowest quality habitat type (2), both indicating absence (100%).

Habitat suitability
4 3 2 1
Presence 1 [0.5] 18 [0.58] 0 [0] 0 [0]
Absence 1 [0.5] 13 [0.42] 2 [1.0] 0 [0]
Total (n = 35) 2 [1.0] 31 [1.0] 2 [1.0] 0 [0]

Again, I used a confusion matrix to further examine the ‘errors’ in the data, or the observations that did not correlate with my model of predicted suitable habitat. I removed habitat suitability type 1 since there were not observations in this category.

Habitat suitability
4 (high) 3 2 (low)
Presence 0.03 0.51 0
Absence 0.03 0.37 0.06

 

Decimals reported indicate the proportion of total observations (n = 35) that fell within this category. The confusion matrix shows that the model fit nearly all observations (31) into the type 3 habitat category, with a near even split between presence (18) and absence (13). In reference to the confusion matrix from exercise 2 at the top of this blog, it is difficult to make a direct comparison of the errors since most of the observations fell into type 3.

Critique of the method
I was surprised to see how drastically the map of suitable habitat changed by manipulating only one of the habitat parameters. The adjustment of the rank values for salinity resulted in a vast reduction in area attributed to the highest quality habitat (type 4). The results indicate that choosing the salinity parameter to manipulate did not result in a better fit model and that changes to salinity rank values were too drastic. Since the salinity parameter contains only 4 subcategories, or 4 different weighted salinity values, the impacts to the habitat suitability map were greater than if the parameter had had more nuance. For example, the bathymetry parameter has 10 subcategories and a reworking of the ranking values within could have made more subtle changes to the habitat suitability map.

The next steps would be to examine another parameter, either substrate or bathymetry, to see if adjustments to ranking values result in a more appropriate illustration of suitable habitat. Additionally, the collection of more oyster location data points will help in creating a better fit model and understanding the nuances of suitable habitat in Yaquina Bay.

 

Exercise 3: Lagoons, ENSO Indices, and Dolphin Sightings

Exercise 3: Are bottlenose dolphin sightings distances to nearest lagoon related to ENSO indices in the San Diego, CA survey site?

1. Question that you asked

I was looking to see a pattern at more than one scale, specifically the relationship with ENSO and sighting distributions off of San Diego. I asked the question: do bottlenose dolphin sighting distributions change latitudinally with ENSO related to distance from the nearest lagoon. The greater San Diego area has six major lagoons that contribute the major estuarine habitat to the San Diego coastline and are all recognized as separate estuaries. All of these lagoons/estuaries sit at the mouths of broad river valleys along the 18 miles of coastline between Torrey Pines to the south and Oceanside to the north. The small boat survey transects cover this entire stretch with near-exact overlap from start to finish. These habitats are known to be highly dynamic, experience variable environmental conditions, and support a wide range of native vegetation and wildlife species.

Distribution of common bottlenose dolphin sightings in the San Diego study area along boat-based transects with the six major lagoons.

 

FID NAME
0 Buena Vista Lagoon
1 Agua Hedionda Lagoon
2 Batiquitos Lagoon
3 San Elijo Lagoon
4 Tijuana Estuary
5 Los Penasquitos Lagoon
6 San Dieguito Lagoon

2. Name of the tool or approach that you used.

I utilized the “Near” tool in ArcMap 10.6 that calculated the distance from points to polygons and associated the point with FID of that nearest polygon. I also used R Studio for basic analysis, graphic displays, and ANOVA with Tukey HSD.

3. Brief description of steps you followed to complete the analysis.

  1. I researched the San Diego GIS database for the layer that would be most helpful and found the lagoon shapefile.
  2. Imported the shapefile into ArcMap where I already had my sightings, transect line, and 1000m buffered transect polygon.
  3. I used the “Near” tool in the Analysis toolbox, part of the of the “proximity toolset”. I chose the point to polygon option with my dolphin sightings as the point layer and the lagoon polygons as the polygon layer.
  4. I opened the attribute table for my dolphin sightings and there was now a NEAR_FID and NEAR_DIST which represented the identification (number) related to the nearest lagoon and the distance in kilometers to the nearest lagoon, respectively.
  5. I exported using the “conversion” tool to Excel and then imported into R studio for further analyses (ANOVA between the differences in sighting distances to lagoons and ENSO indices).

4. Brief description of results you obtained

After a quick histogram in ArcMap, it was visually clear that the distribution of points with nearest lagoons appeared clustered, skewed, or to have a binomial distribution, without considering ENSO. Then, after importing into R studio, I created a box plot of the distance to nearest lagoon compared to the ENSO index (-1, 0, or 1). I ran an ANOVA which returned a very small p-value of 2.55 e-9. Further analysis using a Tukey HSD found that the differences between ENSO states of neutral (0) and -1 and neutral and 1 were significant, but not between 1 and -1. These results are interesting because this means the sightings of dolphins differ most during neutral ENSO years. This could be that certain lagoons are preferred during extremes compared to the neutral years. Therefore, yes, there is a difference in dolphin sightings distances to lagoons during different ENSO phases, specifically the neutral years.

Histogram comparing the distance from the dolphin sighting to nearest lagoon in San Diego during the three major indices of El Niño Southern Oscillation (ENSO): -1, 0, and 1.

 

Violin plot showing the breakdown of distributions of dolphin sighting distances to lagoons (numbered 0-6) during the three different ENSO indices.

5. Critique of the method – what was useful, what was not?

This method was incredibly helpful and also was the easiest to apply once I got started, in comparison to my previous steps. It allowed to both visualize and quantify interesting results. I also learned some tricks for how to better graph my data and to symbolize my data in ArcMap.


Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki

Exercise 1: Preparing for Point Pattern Analysis

Exercise 1

The Question in Context

In order to answer my question: are the dolphin sighting data points clustered along the transect surveys or do they have an equal distribution pattern? I need to use point pattern analysis. I am trying visualize where in space dolphins were sighted along the coast of California, specifically from my San Diego sighting area. In this exercise, the variable of interest is dolphin sightings. These are x,y coordinates (point data) indicating the presence of common bottlenose dolphins along a transect. However, these transect data were not recorded and I needed to recreate these lines to my best abilities. This process is more challenging than anticipated, but will prove useful in the short-term view of this class and project and long-term in management ramifications.

The Tools

As part of this exercise, I used ArcMap 10.6, GoogleEarth, qGIS, and Excel. Although I was only intending on importing my Excel data, saved as a .csv file into ArcMap, that was not working, so other tools were necessary. The final goal of this exercise was to complete point-pattern analyses comparing distance along recreated transects to sightings. From there, the sightings would be broken down by year, season, or environmental factor (El Niño versus La Niña years) to look for distributing patterns, specifically if the points were ever clustered or equally distributed at different points in time.

Steps/Outputs/Review of Methods and Analysis

My first step was to clean up my sightings data enough that it could be exported as a .csv and imported as x-y data into ArcMap. However, ArcMap, no matter the transformation equation, seemed to understand the projected or geographic coordinate systems. After many attempts, where my data ended up along the east coast of Africa or in the Gulf of Mexico, I tried a work around; I imported the .csv file into qGIS with the help of a classmate, and then exported that file as a shape file. Then, I was able to import that shape file into ArcMap and select the correct geographic and projected coordinate systems. The points finally appeared off the coast of California.

I then found a shape file of North America with a more accurate coastline, to add to the base map. This step will be important later when I add in track lines, and how the distributions of points along these track lines are related to bathymetry. The bathymetric lines will need to be rasterized and later interpolated.

The next step was the track line recreation. I chose to focus on the San Diego study site. This site has the most data and the most consistently and standardly collected data. The surveys always left the same port of Mission Bay, San Diego, CA traveled north at 5-10km/hr to a specific beach (landmark), then turned around. It is noted on sighting data whether the track line was surveyed on both directions (South to North and North to South), or unidirectional (South to North). Because some data were collected prior to the invention of a GPS and the commercial availability, I have to recreate these track lines. I started trying to use ArcMap to draw the lines but had difficulty. Luckily, after many attempts, it was suggested that I use Google Earth. Here I found a tool to create a survey line where I can mark the edges along the coastline at an approximate distance from shore, and then export that file. It took a while to realize that the file needed to be exported as a .kml and not a .kmz.

Once exported as a .kml, I was able to convert the .kml file to a layer file and then to a shape file in ArcMap. The next step in this is somehow getting all points within one kilometer of the track line (my spatial scale for this part of the project) to associate with that track line. One idea was snapping the points to the line. However, this did not work. I am still stuck here: the major step before I can have my point data with an association to the line and then begin a point pattern analysis in ArcMap and/or R Studio.

Results

Although I do not currently have results of this exercise, fully. I can say for certain, that it has not been without trying, nor am I stopping. I have been brainstorming and milking resources from classmates and teaching assistants about how to associate the sighting data points with the track line to then do this cluster analysis. Hopefully, based on this can be exported to R studio where I can see distributions along the transect. I may be able to do a density-based analysis which would show if different sections along the transect, which I would need to designate and potentially rasterize first, have different densities of points. I would expect the sections to differ seasonally.

Critiques

Although I add in my opinions on usefulness and ease above, I do believe this will be very helpful in analyzing distribution patterns. Right now, it is largely unknown if there are differences in distribution patterns for this population because they move rapidly and at great distances. But, by investigating data from only the San Diego site, I can determine if there are differences in distributions along the transects temporally and spatially. In addition, the total counts of sightings in each location per unit effort will be useful to see the influx to that entire survey area over time.


Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki

The Biogeography of Coastal Bottlenose Dolphins off of California, USA between 1981-2016

Background/Description:

Common bottlenose dolphins (Tursiops truncatus), hereafter referred to as bottlenose dolphins, are long-lived, marine mammals that inhabit the coastal and offshore waters of the California Current Ecosystem. Because of their geographical diversity, bottlenose dolphins are divided into many different species and subspecies (Hoelzel, Potter, and Best 1998). Bottlenose dolphins exist in two distinct ecotypes off the west coast of the United States: a coastal (inshore) ecotype and an offshore (island) ecotype. The coastal ecotype inhabits nearshore waters, generally less than 1 km from shore, between Ensenada, Baja California, Mexico and San Francisco, California, USA (Bearzi 2005; Defran and Weller 1999). Less is known about the range of the offshore ecotype , which is broadly defined as more than 2 km offshore off the entire west coast of the USA (Carretta et al. 2016). Current population abundance estimates are 453 coastal individuals and 1,924 offshore individuals (Carretta et al. 2017). The offshore and coastal bottlenose dolphins off of California are genetically distinct (Wells and Scott 1990).

Both ecotypes breed in summer and calve the following summer, which may be thermoregulatory adaptation (Hanson and Defran 1993). These dolphins are crepuscular feeders that predominantly hunt prey in the early morning and late afternoon (Hanson and Defran 1993), which correlates to the movement patterns of their fish prey. Out of 25 prey fish species, surf perches and croakers make up nearly 25% of coastal T. truncatus diet (Hanson and Defran 1993). These fish, unlike T. truncatus, are not federally protected, and neither are their habitats. Therefore, major threats to dolphins and their prey species include habitat degradation, overfishing, and harmful algal blooms (McCabe et al. 2010).

This project aims to better understand that distribution of coastal bottlenose dolphins in the waters off of California, specifically in relation to distance from shore, and how that distance has changed over time.

Data:

This part of the overarching project focuses on understanding the biogeography of coastal bottlenose dolphins. Later stages in the project will require the addition of offshore bottlenose sightings to compare population habitats.

Beginning in 1981, georeferenced sighting data of coastal bottlenose dolphin off the California, USA coast were collected by R.H. Defran and team. The data were provided in the datum, NAD 1983. Small boats less than 10 meters in length were used to collect the majority of the field data, including GPS points, photographs, and biopsy samples. These surveys followed similar tracklines with a specific start and end location, which will be used to calculate the sighting per unit effort. Over the next four decades, varying amounts of data were collected in six different regions (Fig. 1). Coastal T. truncatus sightings from 1981-2015 parallel much of the California land mass, concentrating in specific areas (Fig. 2). Many of the sightings are clustered nearby larger cities due to logistics of port locations. The greater number of coastal dolphin sightings is due to the bias in effort toward proximity to shore and longer study period. All samples were collected under a NOAA-NMFS permit.Additional data required will likely be sourced from publicly-available, long-term data collections, such as ERDDAP or MODIS.

Distance from shore will be calculated in a program such as ArcGIS or R package. These data will be used later in the project to compare to additional static, dynamic, and long-term environmental drivers. These factors will be tested as possible layers to add in mapping and finally estimating population distribution patterns of the dolphins.

Figure 1. Breakdown of coastal bottlenose dolphin sightings by decade. Image source: Alexa Kownacki.

 

 

 

 

 

 

 

 

 

 

 

Hypotheses:

I predict that the coastal bottlenose dolphins will be associated with different bathymetry patterns and appear clustered based on a depth profile via mechanisms such as prey distribution and abundance, nutrient plumes, and predator avoidance.

Approaches:

My objective is to first find a bathymetric layer that covers the coast of the entirety of California, USA to import into ArcMap 10.6. Then I need to interpolate the data to create a smooth surface. Then, I can add my dolphin sighting points and create a way to associate each point with a depth. These depth and point data would be exported to R for further analysis. Once I have extracted these data, I can run a KS-test to compare the shape of distribution based on two different factors, such as points from El Niño years versus La Niña years to see if there is a difference in average sighting depth or more common sighting depths based on the climatic patterns. I am also interested in using the spatial statistic analysis tool, Moran’s I, to see if the sightings are clustered. If so, I would run a cluster analysis to see if the sightings are clustered by depth. If not, then maybe there are other drivers that I can test, such as distance from shore, upwelling index values, or sea surface temperature. Additionally, these patterns would be analyzed over different time scales, such as monthly, seasonally, or decadally.

Expected Outcome:

Ideally, I would produce multiple maps from ArcGIS representing different spatial scales at defined increments, such as by month (all Januaries, all Februaries, etc.), by year or binned time increment (i.e. 1981-1989, 1990-1999), and also potentially grouping based on El Niño or La Niña year. Different symbologies would represent coastal dolphin sightings distances from shore. The maps would visually display seafloor depths in a color spectrum by 10 meter difference. Because the coastlines of California vary in terms of depth profiles, I would expect there to be clusters of sightings at different distances from shore, but similar depth profiles if my hypothesis is true. Also, data with the quantified values of seafloor depth would be associated with each data point (dolphin sighting) for further analysis in R.

Significance:

This project draws upon decades of rich spatiotemporal and biological information of two neighboring long-lived cetacean populations that inhabit contrasting coastal and offshore waters of the California Bight. The coastal ecotype has a strong, positive relationship with distance to shore, in that it is usually sighted within five kilometers, and therefore is in frequent contact with human-related activities. However, patterns of distances to shore over decades, related to habitat type and possibly linked to prey species distribution, or long-term environmental drivers, is largely unknown. By better understanding the distribution and biogeography of these marine mammals, managers can better mitigate the potential effects of humans on the dolphins and see where and when animals may be at higher risk of disturbance.

Preparation:

I have a moderate amount of experience in ArcMap from past coursework (GEOG 560 and 561), as well as practical applications and map-making. I have very little experience in Modelbuilder and Python-based GIS programming. I am becoming more familiar with the R program after two statistics courses and analyzing some of my own preliminary data. I am experienced in image processing in ACDSee, PhotoShop, ImageJ, and other analyses mainly from marine vertebrate data through NOAA Fisheries.

Literature Cited:

Bearzi, Maddalena. 2005. “Aspects of the Ecology and Behaviour of Bottlenose Dolphins (Tursiops Truncatus) in Santa Monica Bay, California.” Journal of Cetacean Research Managemente 7 (1): 75–83. https://doi.org/10.1118/1.4820976.

Carretta, James V., Kerri Danil, Susan J. Chivers, David W. Weller, David S. Janiger, Michelle Berman-Kowalewski, Keith M. Hernandez, et al. 2016. “Recovery Rates of Bottlenose Dolphin (Tursiops Truncatus) Carcasses Estimated from Stranding and Survival Rate Data.” Marine Mammal Science 32 (1): 349–62. https://doi.org/10.1111/mms.12264.

Carretta, James V, Karin A Forney, Erin M Oleson, David W Weller, Aimee R Lang, Jason Baker, Marcia M Muto, et al. 2017. “U.S. Pacific Marine Mammal Stock Assessments: 2016.” NOAA Technical Memorandum NMFS, no. June. https://doi.org/10.7289/V5/TM-SWFSC-5.

Defran, R. H., and David W Weller. 1999. “Occurrence , Distribution , Site Fidelity , and School Size of Bottlenose Dolphins ( Tursiops T R U N C a T U S ) Off San Diego , California.” Marine Mammal Science 15 (April): 366–80.

Hanson, Mark T, and R.H. Defran. 1993. “The Behavior and Feeding Ecology of the Pacific Coast Bottlenose Dolphin, Tursiops Truncatus.” Aquatic Mammals 19 (3): 127–42.

Hoelzel, A. R., C. W. Potter, and P. B. Best. 1998. “Genetic Differentiation between Parapatric ‘nearshore’ and ‘Offshore’ Populations of the Bottlenose Dolphin.” Proceedings of the Royal Society B: Biological Sciences 265 (1402): 1177–83. https://doi.org/10.1098/rspb.1998.0416.

McCabe, Elizabeth J.Berens, Damon P. Gannon, Nélio B. Barros, and Randall S. Wells. 2010. “Prey Selection by Resident Common Bottlenose Dolphins (Tursiops Truncatus) in Sarasota Bay, Florida.” Marine Biology 157 (5): 931–42. https://doi.org/10.1007/s00227-009-1371-2.

Wells, Randall S., and Michael D. Scott. 1990. “Estimating Bottlenose Dolphin Population Parameters From Individual Identification and Capture-Release Techniques.” Report International Whaling Commission, no. 12.

——-

Contact information: this post was written by Alexa Kownacki, Wildlife Science Ph.D. Student at Oregon State University. Twitter: @lexaKownacki


 

Predicting spatial distribution of Olympia oysters in the Yaquina estuary, OR

My spatial problem

  • A description of the research question that you are exploring.

My research aims to determine the current abundance and spatial distribution of native Olympia oysters in Yaquina Bay, Oregon. This oyster species has experienced massive decline in population due to overharvest during European settlement of the western United States. Yet its value to the ecosystem, its cultural importance, and its tastiness have made the Olympia oyster a current priority for population enhancement. For my research, I will be focusing on a local population of Olympia oysters in the Yaquina estuary. The goal of my project is to gather baseline information about their current abundance and spatial distribution, then develop a repeatable biological monitoring protocol for assessing this population in the future. Using spatial technology, I will first assess whether the distribution of Olympia oysters can be predicted using three habitat parameters: salinity, substrate availability, and elevation. In collaboration with the Oregon Department of Fish and Wildlife (ODFW), I will use the results of this spatial analysis and field surveys to determine ‘index sites’, which are specific locations within the estuary that are indicative of the larger population. These index sites will be revisited in the future by ODFW’s Shellfish and Estuarine Assessment of Coastal Oregon (SEACOR) team to assess changes in population size and spread over time. If predictions of Olympia oyster distribution are accurate based on the habitat parameters I’ve identified, then I’d also like to analyze potential species distribution under future environmental conditions and under different management scenarios, including habitat restoration and population enhancement.

For this course, I will be exploring this main research question:

How is the spatial pattern of Olympia oysters in the Yaquina estuary [A] related to (caused by) the spatial pattern of three habitat parameters (salinity, substrate, elevation) [B]?

 

  • A description of the dataset you will be analyzing, including the spatial and temporal resolution and extent.

I will be using three spatial datasets, representing each of the habitat parameters, overlaid on one another to rank most to least likely locations for Olympia oyster presence. The salinity dataset is based on historical measurements (1960-2006) and represents a gradient from highest salinity (~32psu) at the mouth of the estuary to fresher water up stream (<16psu). Elevation is represented through a bathymetric dataset from 2001-2002, sourced from the Environmental Protection Agency office in Newport, OR. The substrate data comes from the Oregon ShoreZone mapping effort in 2014, which is managed and updated by the Oregon Coastal Management Program. There’s a couple different ways this data can be used, either as a substrate layer that characterizes substrate type broadly (low resolution) or through vector data with associated data tables that describe the substrate within a tidal zone along the shoreline (higher resolution, but spatial extent is limited).

The images here show the three habitat parameter spatial datasets:

Yaquina Bay bathymetry derived from subtidal soundings in 1953, 1999, 1998, and 2000 by U.S. Army Corps of Engineers.
Data from EPA.

Salinity figure digitized from Lewis et al. (2019) based on Oregon’s wet-season salinity measurements (average salinity November-April).
Lewis, N. S., E. W. Fox, and T. H. DeWitt. 2019. Estimating the distribution of harvested
estuarine bivalves with natural history-based habitat suitability models. Estuarine, Coastal and Shelf Science, 219: 453-472.

Substrate component classes of Yaquina Bay based on data classifications from Coastal and Marine Ecological Classification Standard (CMECS) ‘Estuarine Substrate Component’ layer.
Data from Oregon Coastal Management Program.

  • Predict the kinds of patterns you expect to see in your data, and the processes that produce or respond to these patterns.

I am hypothesizing that the distribution/spread of Olympia oysters in Yaquina Bay is influenced by availability of appropriate habitat parameters; where these parameters align within the appropriate range will determine where the oysters can be found. However, I think that I will find that not all of the parameters equally influence oyster distribution. For example, Olympia oysters have been observed to tolerate a broad salinity range, but are absolutely not present without suitable substrate. I am expecting to see that the influence of a particular habitat parameter changes depending on where the oysters are located within the estuary. I’m curious to see, if possible, which parameter will be most important at what life stage and what may drive changes in population per specific site in the estuary.

I do expect that I will be able to make a prediction about where the oysters will be located based on the habitat parameters, though I am uncertain that the resolution of the spatial data is sophisticated enough to capture nuances in distribution. For example, Olympia oysters are known to be opportunistic in finding suitable substrate and will settle on a wide variety of hard surfaces, including derelict boating equipment, discarded shopping carts, and pilings.

 

  • Describe the kinds of analyses you ideally would like to undertake and learn about this term, using your data.

I want to be able to produce a model that can predict where the oysters are located based not just on the three habitat parameters of interest, but under various environmental conditions and different management scenarios. For example, where might the oysters settle in a given year if rainfall is substantially higher, or if adult oysters spawn earlier, or if oyster growers create Olympia oyster beds for harvest, or if a new habitat restoration site is established, etc.

I’m not especially handy at statistical analysis, so I would like to gain a better understanding of statistics through spatial data. I know that I will need to use statistics to determine how successfully the prediction of Olympia oysters aligns with actual observations in the field, but currently unsure how to do that. A recent study in Yaquina estuary was just released using a similar approach for predicting the distribution of five other bivalve species. This study used R to generate a logistic regression model to determine the probability of each species presence within a given area. I would like to do something similar for my analysis, but need some help.

 

  • What do you want to produce — maps? statistical relationships?

The desired products of this research are habitat suitability maps of the current and future (pending the success of the initial effort) distribution of native Olympia oysters for use by ODFW. As a part of this effort, I will create a map of index site locations to be used in future species monitoring. I would also like to generate a predictive model that can determine distribution of oysters based on annual changes in the local environment (El Nino conditions, heavy rainfall, restoration efforts, introduction of invaders, etc.). While salinity, substrate, and elevation seem to be the main factors influencing oyster distribution, there are a number of other factors that can have effects, including temperature, proximity to the mouth of the estuary, and tidal retention.

 

  • How is your spatial problem important to science? To resource managers?

ODFW currently does not have reliable baseline information on the distribution of Olympia oysters in Oregon. As an ecological engineer, the species provides a number of important benefits to the ecosystem, including water filtration and habitat for other marine creatures. It is culturally significant to local tribes, including the Confederated Tribes of Siletz. This species is not currently listed as threatened or endangered, but if it becomes listed one day, then that designation will trigger a number of mitigation and conservation measures that will be difficult and expensive for agencies and private landowners. Additionally, there’s been some exploration that if the population can become robust again, there is potential to grow and harvest this species as a specialty food product. Given the current slow food movement and interest in local products, Olympia oysters could fit well in this niche.

 

  • How much experience do you have with:

(a) Arc-Info – Little experience, used a bit with older versions of ArcMap.

(b) Modelbuilder and/or GIS programming in Python – I am comfortable with ModelBuilder, but have no experience with Python.

(c) R – Some experience; I took Stats 511 where we used R heavily in a series of lab exercises. I have not applied my own data in R.

(d) Image processing – I have used a variety of Adobe products for graphic design, including Photoshop and InDesign.