I set out this term to delve into a vegetation dataset.  It consists of nearly 2000 veg surveys in seven salt marshes across the Pacific Northwest.  My goal is to be able to predict how vegetation will change with shifts in climate and increasing sea level.  Given the diversity of wildlife that utilize estuaries in various stages of their life cycle, understanding habitat will respond is critical to developing conservation plans.  To achieve this, I broke the problem into three stages:

1: Identify vegetation communities from field data

2: Create a habitat suitability model for each community under current conditions

3: Use habitat suitability model to project for changes in climate and sea level to estimate community response

 

Due to the large number of species identified (42), I first needed to reduce the dataset to only the most common species. I choose to use only the species found in more than 5% of all the survey plots. This left me with 16 species. I then explored how to best combine these species into communities.  By modeling communities rather than species, I am assuming each species within a community will respond the same. Given that salt marsh vegetation is generally stratified by elevation, this is a reasonable assumption to begin with but one that I will need to revisit in the future.  To determine the communities, I used canonical correspondence analysis (CCA), which can be thought of as a Principle Component Analysis for categorical data.  I defined the niche of the communities using 5 environmental variables: elevation (standardized for tidal range), mean daily flooding frequency, distance to channel and bay, distance to bay, and channel density.  The resulting CCA graph:

cca_results

I then used a script in R to determine the optimum number of clusters given the CCA results by minimizing within cluster sum of squares.  Using the following graph, and my own interpretation of the CCA results, I settled on using 5 communities.

k_means_cluster_graph

This figure shows the survey plot locations, coded by community. Notice the differences in complexity across the sites (Bandon has many while Grays Harbor and Nisqually have fewer).

MaxEnt Predictors_Community_Locations

 

To create a continuous prediction of communities and develop a model to project climate responses, I choose to use the MaxEnt habitat suitability modeling tool.  Essentially, MaxEnt compares where a species (or community) occurs against the environment (background).  It creates response curves by extracting patterns while maximizing entropy (randomness).  MaxEnt can take continuous and categorical data as input, and the number of model parameters (few parameters=smoother response curves) can be controlled through the regularization value (1 is default).   You can also control which ‘features’ are used to create the response curves (linear, quadratic, product, hinge, threshold).  In an attempt to create a parsimonious model, I only used linear and hinge features, but left regularization set to 1.  Results from MaxEnt are logistically scaled (0 to 1).  Because I am modeling muliple communities in the same area, I needed a method for determining which community is predicted.  The simplest is to choose the community with the highest predicted value.  This hasn’t been done in the literature, due to issues with how presence data usually collected. But because this dataset comes from standardized field surveys, and I’m using the same predictor layers for all communities, I’m presuming using the maximum value is legitimate.  In addition to the 5 physical predictor layers from the CCA, I added 3  climatic layers to the model; annual precip, max temp in August, min temp in Jul–each are 30 year averages from the PRISM dataset.  Here are the predicted communities from MaxEnt:

MaxEnt_maximum_classification_2

 

I used two methods to determine the potential error in using the maximum predicted value for the community classification. First, I found the number of communities in each location with a predicted value of greater than 50%.  In the figure below, yellow indicates areas where no community has >50% predicted value, while green represents areas with one community over 50%.  There areas with higher community richness (2 or 3) are relatively small, so I have more confidence in this method.

MaxEnt_community_richness

Second, I determined the number of communities within 25% of the maximum predicted value [max value – (max value * 0.25)].  This gives an indication of separation in the predicted values across communities. Here, yellow indicates areas where a single community is separated from the other predicted communities. Green are areas with 2 communities with a close prediction. Given the large proportion of yellow and green, I am again given confidence in using the maximum predicted value for community classification.

MaxEnt_community_richness_prevalence

Here are the ROC AUC curves. AUC is a measure of model fit, with 1 being perfect and 0.5 random.  All models except GP2 shows relatively good model fit (over .75 is usually deemed a worthwhile model).  The species within Gp2 are the most common generalists and I would not have expected MaxEnt to be able to model this community very well.  As I pursue this further, I will likely further split up Gp2 in effort to produce better community classifications.

AUC curves

I have several ‘next steps’ to continue developing this model.  First, I would like to include vegetation data from 7 California salt marshes in order to better capture the environmental variation along the coast.  Developing elevation response models for each site is necessary in order to project this model under climate change and sea-level rise scenarios.  I would also like to explore additional environmental layers, such as soil type and distance to ocean mouth (salinity proxy) to further refine the defined niche.

Print Friendly, PDF & Email

4 thoughts on “Pacific coast salt marsh vegetation communities

  1. Kevin,

    What was the tool/steps you used/took but did not work out for your samples? I think you mentioned that during the presentation and was thinking to myself that may help my project.

    Thanks!

  2. Peggy- I used incremental spatial analysis (distance based moran’s I) to determine how autocorrelated my data were. It didn’t turn out well since my data are in regular grids. I also explore Grouping Analysis. It takes continuous features and determines how to group them given their distribution in n-dimensional space. Doug has a nice post about it. Because you can’t use categorical variables, it ended up not being appropriate for my project.

  3. Great study Kevin! Have you taken Bruce McCune’s “Community Structure and Analysis” course? It looks like you’ve got a good handle on predicting community association and structure already, but his class would be perfect for the data you have and the questions you’re asking.

  4. Kevin,

    Very nice use of multivariate analysis in conjunction with GIS. Next steps: (1) what’s the overall uncertainty in your models? (2) what have you learned about climate change vulnerability? Are the same kinds of plant communities likely to be more inundated in all locations, or will the effects vary?

    I look forward to hearing more,

    Julia

Leave a reply