Habitat Suitability Modeling - California Market Squid - GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

For the purposes of this class I am going to attempt to construct habitat suitability models characterizing the pelagic habitat of an invertebrate species, California market squid (Doryteuthis opalescens) (Fig. 1 –thanks wikipedia), an important prey species for multiple predatory fish (i.e. spiny dogfish sharks and seabirds) and is also commonly captured in the survey region in high abundances.

800px-Opalescent_inshore_squid

The dataset I am working with consists of pelagic fish and invertebrate abundance data that have been collected by NOAA over a 14 year-long (1998-2011) period in the Northern California Current off the Oregon and Washington coasts. Pelagic fish and invertebrates were collected along up to at ~50 stations along eight transect lines off the Washington and Oregon coast in both June and September of each year (Fig. 2). Species were collected using a 30 m (wide) x 20 m (high) x 100 m (long) Nordic 264 pelagic rope trawl (NET Systems Inc.) with a cod-end liner of 0.8 cm stretch mesh. For each sample, the trawl was towed over the upper 20 m of the water column at a speed of ~6 km h^-1 for 30 min (Brodeur, Barceló et al. In Press MEPS).

In addition to species abundance data, survey personnel also collect in situ environmental data at each fish sampling station during each survey, including; water column depth, salinity, temperature and chlorophyll a data, as well as oxygen and turbidity data when instruments were available. One of my goals for this class is to supplement this in-situ environmental dataset with remotely sensed temperature and primary productivity as well as turbidity data from the MODIS-Aqua and SeaWiFS platforms in order to obtain a broader environmental context.

For my habitat suitability modeling approach I will utilize R to conduct Generalized Additive Mixed Effects Models (GAMMs) correlating the environmental covariates to both presence/absence data as well as abundance (catch per unit effort) data. Additionally, I will experiment with Maxent and other habitat suitability modeling techniques available to compare their output to my GAMM models.

Some of the spatial and temporal hurdles I face with this dataset include:

Unequal spacing between sampling locations: This may pose a challenge when attempting to spatially interpolate.

Scope of inference: The habitat modeling that I’ll attempt for this species is likely applicable only in the Northern California current or in a slightly extended region.

Scale of environmental data: The fact that I will be using environmental data from two different sources (in situ data (point data – localized measurement) vs. remotely sensed data (raster satellite data – 500m-1km grain)) will affect the resolution of my interpretations of habitat for this species.

Spatial autocorrelation among stations: Abundances and/or presence/absence of market squid may be spatially correlated among nearby stations due to autocorrelation in environmental covariates that define their habitat.

Temporal autocorrelation for each station: As the data I am using is a bi-annual survey, it is possible that the abundance and spatial structure of market squid within our sampling area is correlated between the two seasons of sampling. It is also possible that the temporal autocorrelation of an individual station with itself though time is not too big of a problem given the fluid medium in which sampling occurs and the highly variable inter-seasonal winds and currents in this region.

6 thoughts on “Habitat Suitability Modeling – California Market Squid”

This seems like an interesting and worthwhile project.

Have you thought about the issue of multiple colinearity to some of your covariate data? It may be worth checking that. If you do find such issues, there are statistical techniques to still utilize all the covariates, while accounting for multiple colinearity, even within the context of using a GAMM model.

Please let me know if you encounter this issue, and I can go over the techniques available to you!

If the samples are from Washington, Oregon, and Northern California, why do you say your scope of inference is restricted to Northern California? If the reason is because of the other data you are bringing in, not the squid data, would you also need to restrict the squid data to samples from that region, and perhaps exclude those from Washington?

What are the two times of year for sample collection? How do they line up with the reproductive cycle of the squid? Is there a wrong time of year to sample squid? Would sampling at one period result in overestimating abundance throughout the year, whereas perhaps sampling in the dead of winter would underestimate?

Did the researchers make an effort to sample from the same specific location over time, or is every sample essentially a new location? Repeated measures would be helpful if you are trying to determine the likely of local area extinction and colonization.

Once you have estimated the area of suitable habitat, wouldn’t it be relatively straightforward to estimate total population abundance? Is that something you are interested in?

Since some of your raster data is similar to your point data, are you hoping to be able to use that information to aid in the spatial interpolation once you’ve identified the variables with the most predictive power? Can you use it to make suitability predictions for months other than June and September?

Since your minimum point spacing appears to be more than 1 km, and you will presumably be running the models only at those points, I’m not sure the resolution of your rasters is a problem, unless I misunderstand how you will be using them. It should be a fairly simple matter to extract the raster values into your point data and run the model on the points.

I am interested with your habitat suitability modeling, because my research is focused on forage suitability model and map. How can you defined the suitability? based on species abundant data? Have you ever consider the species tolerance?

I’m curious how you plan to deal with the fact that you have (per my understanding) irregular in situ environmental data, since it sounds like the field team did not always have the proper equipment. Perhaps what in situ data you have could just be used to cross check the available raster data for a sort of ground truthing error analysis. It seems that a short baseline assessment of the raster data might make you confident enough to rely primarily on that layer. Perhaps I’m suggesting throwing out a lot of lovely data, but I’m obviously unsure of the volume of data you hold from in situ measurements.
Also, you mention 8 transects, but the figure shows ten well spaced transects. Are you not using a few from the middle, affecting the spacing, or from the ends?

hi Caren,
I’m wondering about four issues you raise:
1) spacing of your sample, including spatial autocorrelation
2) GAMMs vs. Maxent
3) different spatial extent, resolution, and coverage of environmental covariates and observations, including point to area problem
4) temporal autocorrelation

1) (a) spacing of your sample. is this an opportunistic sample or was there a sampling design? You could visually assess (or quantify using point patterns) whether the sampling points are random or otherwise. (b) spatial autocorrelation – you could test for spatial autocorrelation to see if high abundance values are more close to one another than expected by random chance.

2) Let’s discuss the differences between GAMMs and Maxent, since many students seem to be using both and their similarities and differences should be considered. Have you read the Elith et al 2011 paper in Diversity and Distributions about Maxent? And do you have some favored references about GAMMs?

3) (a) you could convert your points to a raster. have you tried this? (b) have you tried fitting models to the fine-scale data separately from the coarse-scale data? Your problem here seems very close to Andrea Havron’s.

4. Do you mean “bi-annual” (every two years) or semi-annual (twice a year)? If samples are taken every year at the same season, they mean something different than samples twice a year (in different seasons). Seems like we should discuss what these data might mean and what your question is – presumably it’s about some aspect of persistence?

Julia

Eric Coker said on April 14, 2014 at 1:47 pm:

This seems like an interesting and worthwhile project.

Have you thought about the issue of multiple colinearity to some of your covariate data? It may be worth checking that. If you do find such issues, there are statistical techniques to still utilize all the covariates, while accounting for multiple colinearity, even within the context of using a GAMM model.

Please let me know if you encounter this issue, and I can go over the techniques available to you!
Log in to Reply
Erik Rose said on April 14, 2014 at 2:02 pm:

If the samples are from Washington, Oregon, and Northern California, why do you say your scope of inference is restricted to Northern California? If the reason is because of the other data you are bringing in, not the squid data, would you also need to restrict the squid data to samples from that region, and perhaps exclude those from Washington?

What are the two times of year for sample collection? How do they line up with the reproductive cycle of the squid? Is there a wrong time of year to sample squid? Would sampling at one period result in overestimating abundance throughout the year, whereas perhaps sampling in the dead of winter would underestimate?

Did the researchers make an effort to sample from the same specific location over time, or is every sample essentially a new location? Repeated measures would be helpful if you are trying to determine the likely of local area extinction and colonization.

Once you have estimated the area of suitable habitat, wouldn’t it be relatively straightforward to estimate total population abundance? Is that something you are interested in?
Log in to Reply
prepperc said on April 15, 2014 at 12:05 pm:

Since some of your raster data is similar to your point data, are you hoping to be able to use that information to aid in the spatial interpolation once you’ve identified the variables with the most predictive power? Can you use it to make suitability predictions for months other than June and September?

Since your minimum point spacing appears to be more than 1 km, and you will presumably be running the models only at those points, I’m not sure the resolution of your rasters is a problem, unless I misunderstand how you will be using them. It should be a fairly simple matter to extract the raster values into your point data and run the model on the points.
Log in to Reply
yangxiu said on April 15, 2014 at 3:19 pm:

I am interested with your habitat suitability modeling, because my research is focused on forage suitability model and map. How can you defined the suitability? based on species abundant data? Have you ever consider the species tolerance?
Log in to Reply
Rebecca Sexton said on April 16, 2014 at 12:33 pm:

I’m curious how you plan to deal with the fact that you have (per my understanding) irregular in situ environmental data, since it sounds like the field team did not always have the proper equipment. Perhaps what in situ data you have could just be used to cross check the available raster data for a sort of ground truthing error analysis. It seems that a short baseline assessment of the raster data might make you confident enough to rely primarily on that layer. Perhaps I’m suggesting throwing out a lot of lovely data, but I’m obviously unsure of the volume of data you hold from in situ measurements.
Also, you mention 8 transects, but the figure shows ten well spaced transects. Are you not using a few from the middle, affecting the spacing, or from the ends?
Log in to Reply
Julia Jones said on April 23, 2014 at 12:51 pm:

hi Caren,
I’m wondering about four issues you raise:
1) spacing of your sample, including spatial autocorrelation
2) GAMMs vs. Maxent
3) different spatial extent, resolution, and coverage of environmental covariates and observations, including point to area problem
4) temporal autocorrelation

1) (a) spacing of your sample. is this an opportunistic sample or was there a sampling design? You could visually assess (or quantify using point patterns) whether the sampling points are random or otherwise. (b) spatial autocorrelation – you could test for spatial autocorrelation to see if high abundance values are more close to one another than expected by random chance.

2) Let’s discuss the differences between GAMMs and Maxent, since many students seem to be using both and their similarities and differences should be considered. Have you read the Elith et al 2011 paper in Diversity and Distributions about Maxent? And do you have some favored references about GAMMs?

3) (a) you could convert your points to a raster. have you tried this? (b) have you tried fitting models to the fine-scale data separately from the coarse-scale data? Your problem here seems very close to Andrea Havron’s.

4. Do you mean “bi-annual” (every two years) or semi-annual (twice a year)? If samples are taken every year at the same season, they mean something different than samples twice a year (in different seasons). Seems like we should discuss what these data might mean and what your question is – presumably it’s about some aspect of persistence?

Julia
Log in to Reply

GEO599/GEO584-Advanced Spatial Statistics and GIS, 2013-2016

Just another blogs.oregonstate.edu site

Habitat Suitability Modeling – California Market Squid

6 thoughts on “Habitat Suitability Modeling – California Market Squid”

Leave a reply Cancel reply

Contact Info