I set out this term to delve into a vegetation dataset. It consists of nearly 2000 veg surveys in seven salt marshes across the Pacific Northwest. My goal is to be able to predict how vegetation will change with shifts in climate and increasing sea level. Given the diversity of wildlife that utilize estuaries in various stages of their life cycle, understanding habitat will respond is critical to developing conservation plans. To achieve this, I broke the problem into three stages:
1: Identify vegetation communities from field data
2: Create a habitat suitability model for each community under current conditions
3: Use habitat suitability model to project for changes in climate and sea level to estimate community response
Due to the large number of species identified (42), I first needed to reduce the dataset to only the most common species. I choose to use only the species found in more than 5% of all the survey plots. This left me with 16 species. I then explored how to best combine these species into communities. By modeling communities rather than species, I am assuming each species within a community will respond the same. Given that salt marsh vegetation is generally stratified by elevation, this is a reasonable assumption to begin with but one that I will need to revisit in the future. To determine the communities, I used canonical correspondence analysis (CCA), which can be thought of as a Principle Component Analysis for categorical data. I defined the niche of the communities using 5 environmental variables: elevation (standardized for tidal range), mean daily flooding frequency, distance to channel and bay, distance to bay, and channel density. The resulting CCA graph:
I then used a script in R to determine the optimum number of clusters given the CCA results by minimizing within cluster sum of squares. Using the following graph, and my own interpretation of the CCA results, I settled on using 5 communities.
This figure shows the survey plot locations, coded by community. Notice the differences in complexity across the sites (Bandon has many while Grays Harbor and Nisqually have fewer).
To create a continuous prediction of communities and develop a model to project climate responses, I choose to use the MaxEnt habitat suitability modeling tool. Essentially, MaxEnt compares where a species (or community) occurs against the environment (background). It creates response curves by extracting patterns while maximizing entropy (randomness). MaxEnt can take continuous and categorical data as input, and the number of model parameters (few parameters=smoother response curves) can be controlled through the regularization value (1 is default). You can also control which ‘features’ are used to create the response curves (linear, quadratic, product, hinge, threshold). In an attempt to create a parsimonious model, I only used linear and hinge features, but left regularization set to 1. Results from MaxEnt are logistically scaled (0 to 1). Because I am modeling muliple communities in the same area, I needed a method for determining which community is predicted. The simplest is to choose the community with the highest predicted value. This hasn’t been done in the literature, due to issues with how presence data usually collected. But because this dataset comes from standardized field surveys, and I’m using the same predictor layers for all communities, I’m presuming using the maximum value is legitimate. In addition to the 5 physical predictor layers from the CCA, I added 3 climatic layers to the model; annual precip, max temp in August, min temp in Jul–each are 30 year averages from the PRISM dataset. Here are the predicted communities from MaxEnt:
I used two methods to determine the potential error in using the maximum predicted value for the community classification. First, I found the number of communities in each location with a predicted value of greater than 50%. In the figure below, yellow indicates areas where no community has >50% predicted value, while green represents areas with one community over 50%. There areas with higher community richness (2 or 3) are relatively small, so I have more confidence in this method.
Second, I determined the number of communities within 25% of the maximum predicted value [max value – (max value * 0.25)]. This gives an indication of separation in the predicted values across communities. Here, yellow indicates areas where a single community is separated from the other predicted communities. Green are areas with 2 communities with a close prediction. Given the large proportion of yellow and green, I am again given confidence in using the maximum predicted value for community classification.
Here are the ROC AUC curves. AUC is a measure of model fit, with 1 being perfect and 0.5 random. All models except GP2 shows relatively good model fit (over .75 is usually deemed a worthwhile model). The species within Gp2 are the most common generalists and I would not have expected MaxEnt to be able to model this community very well. As I pursue this further, I will likely further split up Gp2 in effort to produce better community classifications.
I have several ‘next steps’ to continue developing this model. First, I would like to include vegetation data from 7 California salt marshes in order to better capture the environmental variation along the coast. Developing elevation response models for each site is necessary in order to project this model under climate change and sea-level rise scenarios. I would also like to explore additional environmental layers, such as soil type and distance to ocean mouth (salinity proxy) to further refine the defined niche.