Exploring Giant Gourami Distribution Models
The research question:
In the context of predicting the impacts of the rapidly expanding aquaculture industry and understanding the impacts of natural and human changes on the landscape, one of the overarching research question that I am interested in is: What is the natural distribution of the the air-breathing giant gourami (Osphronemus goramy) in South East Asia and how is it predicted to change over time with respect to 1) the biophysical properties of the landscape, 2) human impacts, and 3) climate projections into the future.
Specific to this class, the research question that I explored was, what is the distribution of the giant gourami in SE Asia in the year 2000 based on 4 environmental variable (NDVI, precipitation, surface temperature, and river flow accumulation) and human population density.
Background: Giant gourami inhabits regions characterized by fresh to brackish water and in slow-moving areas like swamps, lakes, and large rivers. Given its unique ability to breather air, this fish can survive in poorly oxygenated water to anoxic areas. Already a popular fish in the aquarium trade and eaten in some regions, this fish is farmed in SE Asia. I expect that with climate change, increased urbanization, and the changing hydrologic profile of the system due to potential dams that this fish may become more suitable than others for its ability to live in ‘poorer’ environmental conditions.
The Dataset:
My dataset consists of points of fish presence/pseudo-absence across SE Asia (image above) characterized by associated environmental variables of interest. The ‘presence’ portion of my dataset was pulled from fishbase.com, consisting of 47 occurrence points for giant gourami collected between primarily 1980s-1990s through an unknown sample protocol. Clipping to my study region, SE Asia left me with 36 presence points. Background data, or pseudo-absence points were generated randomly along rivers and streams of SE Asia in ArcMap. Environmental data for all points were taken from freely available satellite datasets listed below on Google Earth Engine in a 1km buffer around the point data and filtered to the date range of Feb-Jun 2000 when possible (for retaining consistency in the dataset).
- NDVI: 32-day Landsat 7
- Precip: CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) at 0.05deg resolution.
- Flow: HydroSHEDS flow accumulation 15-arc seconds (Feb 11-22, 2000)
- Temp: NCEP/NCAR reanalysis 2.5deg resolution
- Population: WorldPop project- Estimated mean population density 2010 & 2016 at a 100x100m resolution
Hypotheses:
Based on my point data and the variables used, my hypotheses are below (with the third un-tested):
- Giant gourami are distributed in a wide range of habitat types, including relative warmer surface temperatures, a range of flows, and in areas with range of precipitation.
- This distribution of the Giant gourami is not affected by human population density.
- Giant gourami distributions will not change much given predicted climate warming (untested)
Approaches:
This was an exploratory analysis that employed the species distribution model, boosted regression trees (BRT). This is an ensemble tree-based species distribution model that iteratively grows small/simple regression trees based on the residuals from all previous trees to explore non-parametric data. BRTs consist of two components: regression trees and boosting and ultimately help to identify the variables in the dataset that best predict where the species is expected based on the presence/absence data. (you can view my the boosted regression tutorial for more on this analysis approach).
Results:
After building my dataset in Arc and through Google Earth Engine, I was able to produced BRT results in R studio for several combinations of learning rates and tree complexities with and without the population data as a variable. Preliminary analysis indicates that precipitation and temperature contribute the most in determining where giant gourami are expected, based on the data. Digging into my two hypotheses, I explored the contribution of temperature on giant gourami distribution and population density.
Precipitation was identified by the BRT models as the strongest contributor in determining in the likelihood of finding a giant gourami in my study area.
- R output:
- > gourami.tc3.lr005$contributions
- var: rel.inf
- mean_Precip: 60.511405
- mean_temp: 25.182547
- pop_density: 10.984079
- NDVI_mean: 1.673674
- flow: 1.64829
Exploring the spatial relationship between precipitation trends and my presence/pseud-oabsence point data, it is clear that the presence points are clustered in regions with lower rainfall. As a means to explore this relationship further, it might be helpful to shrink the extent of my study area to represent a smaller area, more representative of the conditions associated with the presence points.
Population density did not appear to have an effect on model performance as expected. As shown in the model output below, which displays model output for models with population density as a variable in the left column and without on the right. Model deviance in cross-validation does not change when population density is removed as an explanatory variable.
Exploring this relationship visually on a map, this result makes sense as the point data are distributed across a wide range of population densities.
Significance:
The results of this exploratory analysis revealed some interesting patterns in giant gourami distribution, but is limited in a big way by the point data used. Points were only present in Malaysia due to availability, so this distribution pattern is possibly more a function of sample effort than actual distribution. If the analysis were limited to Malaysia only, it may provide a better representation of the data.
Understanding the spatio-temporal patterns that govern the range of species like the giant gourami allow resource managers to help meet increasing demands and at the same time mitigate environmental harm. Protein consumption is expected to increase to 45kg per capita by 2020, a 25% increase from 1997—the fish consumption rate is no outlier. The growing aquaculture industry provides roughly half of the global fish supply (FAO, 2014). The giant gourami are 1 species of ~400 air breathing fish present–ideal candidates for aquaculture. This prospect presents an opportunity for increased protein production in a changing climate, but also increases threat of invasion/outcompetition.
Lessons Learned:
In this analysis I learned about integrating Arc, Google Earth Engine, and R for building my dataset as well as running a boosted regression tree model. Through the process, I tracked my progress, workflow, and challenges. This allowed me to identify areas where I could have been more efficient. For example, I generated my background/psuedo-absence points in Arc and then brought them into Earth Engine, when I could have done the whole process in Earth Engine.
A note on Earth Engine: the reason that I chose to use this platform was for its ability to access large global datasets across wide ranging time frames quickly and without the need to download to manipulate or extract data. Earth Engine functions through javascript, which is a hurdle to overcome for someone new to programming.
My data analysis was done in R studio, through which I learned to run a BRT model with the ‘gbm’ package, I learned some simple histogram plotting functions, and how to import/view raster images with the packages ‘raster’ and ‘rgdal’.
In terms of the statistics that I explored for this analysis, I have listed some of my take home points below:
- Boosted Regression Tree strengths and limitations
- Deals with non-parametric data well
- Is able to deal with small sample sizes
- Pseudo-absence points present issues in interpreting results since they are not true absence points
- Sample bias is potentially a problem
- Is not a spatially explicit analysis so issues like spatial autocorrelation are not dealt with
- Hot Spot Analysis (conducted on a different dataset)
- A quick and easy way to identify patterns at different scales for data points that are independent of the researcher and span a large temporal and spatial range.
- could be an approach to hypothesis generation.