Background

The relationship between microbiome composition and host health has recently generated a great deal of attention and research. The importance of host-associated microbiomes is still poorly understood, although significant relationships between gut micobiome composition and host health have been described. Although the gut microbiome has received the most attention, each body site is home to its own distinct microbial community. The nasal microbiome has received relatively little attention, although a few studies suggest that there is a relationship between nasal microbiome composition and incidence of infections. Using a unique system of closely studied semi-wild African Buffalo, I propose to study the drivers of nasal microbiome composition in a social species.

Data Collection

Our study herd of semi-wild African Buffalo (Syncerus caffer) is kept in a 900 hectare predator-free enclosure located in Kruger National Park, South Africa. Every 2-3 months, all 60-70 individuals are captured, and biological samples are collected for diagnosis as part of a larger study that the Jolles lab is conducting on Foot-and-Mouth Disease. Age, sex, and body condition are recorded, in addition to a number of other physiological parameters. Degree of relatedness is known for each pair of individuals in the herd. Each animal is fitted with a GPS collar that is programmed to record location every 30 minutes, and with a contact collar that records identity and duration of contacts with other buffalo. The GPS data used in this exploratory analysis was collected between October-December 2015.

Overarching Hypotheses

My research is guided by the following two hypotheses:

  1. Conspecific contacts drive nasal microbiome similarity and disease transmission.
  2. Habitat overlap drives nasal microbiome similarit and disease transmission.

I also propose to examine the relationship between the other parameters (age, sex, body condition, relatedness, etc) on nasal microbiome composition, but for this class I focused on the spatial parameters.

Approaches

The first step toward addressing hypothesis 1 is to quantify conspecific contacts. I tested several approaches during the course of this class. The first approach, presented during my first presentation was to generate buffers in ArcGIS around each animal at a given time point and generate a measure of “crowdedness.” The second method was to generate a contact matrix in R (see figure 1) to show distances between each individual at a given time point. Detailed methods and R code are given in my first blog post.

dist_matrix

Figure 1: Sample distance matrix, generated for a subset of 5 animals at a single time point. Distances range between 3-18 meters.

As an intermediate step towards addressing my second hypothesis, it will be necessary to measure the percent habitat overlap between every pair in the herd. I used a suite of tools to carry out a test analysis using GME, ArcMap, and excel. Detailed methods are outlined in my second blog post. In summary, I used GME to generate kernel densities and 95% isopleths for two individuals, then used the identity feature in Arc to calculate area of overlap. Figure 2 shows a sample of the output from the identity tool.

overlap

Figure 2: Habitat overlap between two individuals in the study herd. Individuals’ home ranges are colored yellow and purple, respectively. Area of overlap between the two animals is outlined in red. Overlap between the two animals is 80% and 90%, respectively.

Eventually, once I have determined contact network and habitat overlap matrices, I will look at correlation between microbiome similarity and habitat overlap and average distance with conspecifics.

Significance

  1. Distance matrix: this analysis showed potential usefulness for researching herd behavior and structure, and I will likely return to it in future analysis. However, since my question of interest relates to microbe transmission, which is likely to be associated only with close contacts, I plan to focus my current efforts on utilizing the contact collars I described in the data section. Although I will lose some spatial information, it will simplify my analysis and increase temporal resolution.
  2. Habitat overlap: The method described here for measuring habitat overlap shows great promise for use in my research, especially if the process can be automated. I will explore iteration functions in GME and ArcGIS ModelBuilder to find the best way to expedite this analysis across multiple pairs of individuals at multiple time points.

Potential Issues

A few problems that I will likely need to deal with as my analysis progresses:

  1. The possibility of high correlation between conspecific contacts and habitat overlap. I will try to control for this by looking for animals that have high spatial overlap but low contact rate, and vice versa.
  2. Non-matching pairwise percent overlap. For example, individual 13 showed 90% overlap with individual 1, whereas 1 showed only 80% overlap with 13. I can deal with this by looking at pairwise averages, unless the percent overlap is too different, in which case I will need to explore other options.

Lessons Learned

Software Packages:

Thanks to help from my classmates, I became much more comfortable and familiar with geospatial functions in R. I also used GME for the first time and discovered that it has great potential usefulness for my future analyses. I became familiar with ModelBuiler in ArcGIS while attempting to iterate the buffering analysis. I still have a lot to learn with all these tools, but I feel much more confident than I did prior to this class.

Statistical methods:

Although I did not utilize any of the statistical methods outlined in the syllabus due to the unique nature of my dataset, I learned a great deal from watching my classmates present. I expect that hotspot analysis, multivariate statistics, and different types of regression models will be part of my future. In particular, I plan to use regression models and PCA to help analyze my data in the future.

 

 

 

 

Introduction

In my last post, I discussed the importance of quantifying social contacts to predict microbe transmission between individuals in our study herd of African Buffalo. This post deals with the other half of the equation: environmental transmission. My overarching hypothesis is that spatial and social behaviors are important for transmission of host-associated microbes. The goal of this pilot project is to establish a method for quantifying habitat overlap between individuals in the herd so that I can (eventually) compare habitat overlap with microbe community similarity and infection risk.

Methods/Results

Workflow:

I used the following outline to determine pairwise habitat overlap between a “test” pair of buffalo:

  1. Project GPS points in ArcMap
  2. Clip GPS points to the enclosure boundary layer
  3. Use GMe to generate kernel density estimates and 95% isopleths
  4. Back in Arc, used the identity tool to calculate area of overlap between the two individuals
  5. Computed percent overlap using Excel

Geospatial Modelling Environment (GME):

GME is a free geostatistical software that combines the computational power of R with the functionality of ESRI ArcGIS to drive geospatial analyses. A few of the functions that are especially useful for analyzing animal movement are: kernel density, minimum convex polygons, movement parameters (step length, turn angle, etc), converting locations to paths and paths to points, and generating simulated random walks. For this analysis, I used GME to generate kdes and estimate home ranges using 95% isopleths.

Kernel density estimates and isopleth lines

Kernel densities can be conceptualized as 3-dimensional surfaces that are based on density of points. Isopleth lines can be drawn in GME based on a raster dataset (such as a kernel density) and can be set to contain a specified volume of surface area. For this analysis, I was interested in calculating 95% isopleths based on the kernel densities of GPS points. In real life, this means the area that an animal spends the majority of its time in.

Using the identity tool to compute overlap

After generating home range estimates for two individuals, I uploaded the resulting shapefiles to ArcMap and used the Identity tool to overlap the home ranges.  To use the identity tool, you input one identity feature and one or more input features. Arc then computes a geometric intersection between input features and identity features and merges their attributes in the output.

overlap

The map above shows 95% isopleths from animal 1 (yellow), animal 13, (purple), and the area of overlap computed using the intersect tool (red line). I exported the output table to excel, where I calculated percent overlap between animals.

Conclusion

Overall, this method seems like it will be great for estimating habitat overlap. A few things that I’m concerned about are:

(a) Habitat use may be so similar for all animals that overlap cannot be related to differences in microbe transmission.

(b) Habitat use may correlate very strongly with contacts, in which case it will be difficult to control for the effects of contacts on microbe transmission.

(c) Percent overlap can be different for each individual in a pair. In my example, #13 overlapped #1 by ~80%, while #1 overlapped #13 by 90%.

I just want to be aware of these potential issues and start thinking about how to deal with them as they arise. Any suggestions would be appreciated, as always!

Introduction 

My research goal is to determine the effects of social and spatial behavior on disease transmission and nasal microbiome similarity. The overarching hypothesis of this project is that composition of microbial communities depends on host spatial and social behavior because microbe dispersal is limited by host movement. My research group studies a herd of African Buffalo in Kruger National Park that is GPS collared and lives in a 900 hectare predator-free enclosure. In conjunction with the GPS collars, each animal has a contact collar that records contact length and ID of any animal that comes within ~1.5 meters. However, for this class I focused on the GPS collars in the effort to test whether they are sufficient for inferring contact networks.

 

The purpose of this step in my research was to establish a method for creating distance matrices between animals at different time points using only GPS data. I wanted to determine whether this is a viable option for inferring contact networks and could be used in lieu of the contact collars. If this method is effective, my plan was to iterate through multiple time points to determine average distances between animals over time.

Results from this pilot study showed that the method is feasible, however, GPS collars would be less effective for inferring a contact network than the contact collars because temporal resolution is sacrificed.

 

Data Manipulation in R

The raw data from the buffalo GPS collars is in the form of individual text files, one text file per individual per time point. Step one was to generate a csv file that included all individuals at one time point. This was done in R using the following code:

##Load necessary packages

library(reshape2)
library(ggplot2)

##Set working directory and set up loop to import data

setwd(“C:\\Users\\couchc\\Documents\\GPS_captures_1215\\GPS_capture_12_15”)
SAVE_PATH <- getwd()
BASE_PATH <- getwd()

CSV.list<-list.files(BASE_PATH,pattern=glob2rx(“*.csv”))

all.csv<-read.csv(CSV.list[1], header = TRUE)

names(all.csv)

buffalo<-data.frame(ID=all.csv$title,X=all.csv$LATITUDE,Y=all.csv$LONGITUDE, Day=all.csv$MM.DD.YY, Hour=all.csv$HH.MM.SS, s.Date=all.csv$s.date, s.Hour=all.csv$s.hour)
## Now we want to melt them and cast them based on some data within the frame.

melted.all.csv<-melt(buffalo,id=c(“ID”))
names(melted.all.csv)

casted.all.csv<-dcast(melted.all.csv, LATITUDE+LONGITUDE~id)
# Where our previous data was organized with ID as a column,
# now ID is a row, and we can see what value an ID took at a given location.

length(buffalo[,1])
ggplot(buffalo[buffalo$ID==buffalo$ID[1],], aes(x=X, y=Y, value = ID))+
geom_raster()

 

Distance Matrix

Because the GPS collars are not synchronized, the 30-minute time intervals can be offset quite a bit, and there are quite a few missing time intervals. To try and correct this problem, I rounded times to the nearest 30 minutes in excel. I then imported the new excel file back into R to generate a “test” distance matrix for a single time point using the following code:

-Show code and output once I’m able to run it on a lab computer (My computer doesn’t have the processing power to do it in a reasonable time).

 

Conclusion

This method showed some interesting results: inter-individual distances can be calculated in R using GPS collar data, and if the process were iterated over multiple time points, average pairwise distances could be computed between each individual. A mantel test could be used to determine correlation between . The problem with rounding time points to the nearest 30 minutes is that it doesn’t guarantee that the buffalo are actually ever coming within the distance that is given by the matrix. Since some of the collars are offset by up to 15 minutes, the animals could be in the recorded locations at different times, and never actually come into contact with each other. The benefit of using GPS collars to infer social behavior is that it gives us actual distances between individuals, which adds an element of spatial resolution not provided by the contact collars. Contact collars only read when another animal is within a specific distance, but no measure of actual distance is recorded. However, since we are looking for the best way to predict contact transmission of microbes for this pilot project,  we are not interested in contact distances past a certain threshold. Although the distance matrix could prove useful for other behavior studies, it provides less relevant information than the contact collars. This could be a useful method if we wanted actual distances between organisms. However, to simplify we would like to set a threshold distance for microbe transmission, which is easily done with the contact collars. However, we would need much higher temporal resolution in our GPS data to be able to build distance matrices that are precise enough to infer contact networks, and this process is more easily done using contact collars.

 

Overall, this pilot project was useful because it demonstrated some of the potential benefits and drawbacks of using GPS intervals to infer contact networks. Although I do not plan to use the method described in this post for my dissertation research, it may be beneficial in future studies of animal behavior or herd structure. The exercise was very useful for improving my confidence in R — with lots of help from my classmates of course!

 

1. Description of Research Question

Disease transmission is intrinsically tied to space use and behavior: Individuals are exposed to pathogens based on where and with whom they spend their time. I will explore how different spatial personalities may affect individual disease risk and herd disease dynamics in a social species. For this project, I will specifically examine individual realized aggregation (IRA), or the degree to which different individuals in my study system aggregate with others, and will relate IRA to risk of exposure to directly transmitted diseases. To explore this question, I will make use of a unique study system of GPS-collared semi-wild African Buffalo (Syncerus caffer) located in Kruger National Park (KNP), South Africa.

IMG_3439

 

2. Description of Dataset

The dataset I will be analyzing consists of approximately eight months of GPS readings from each of the 70 individual buffalo in the herd, collected at ~30 minute intervals. Accuracy tests have yet to be performed, but GPS collars should have at least 5-10 meter accuracy range. The map below shows the 900 hectare enclosure which serves as the study area.

boma

For a project in a previous class, I created tracking animations using a subset of data from a single individual during a 24-hour period. Output from the tracking tool is shown below. This output shows that we can distinguish between periods of high movement (i.e. tracks are far apart) versus low movement (tracks close together) for each individual.

Screen Shot 2016-03-11 at 9.08.07 PM

3. Hypotheses

I hypothesize that individuals will have different spatial behavioral personalities, demonstrated by the maintenance of relatively stable differences in IRA. This hypothesis is based on previous field observations, suggesting that individuals maintain stable herd positions over time. I further hypothesize that individuals with high IRA will be exposed to more directly transmitted diseases than those with low IRA.

4. Approaches

I expect that the approaches I take will evolve throughout the course of this project, but currently my plan is as follows:

My first step will be putting the buffalo movement data in the correct format. Currently, I have separate text files of GPS readings for each individual buffalo over each capture period. I will need to combine all individuals into a single spreadsheet for each capture period in order look at relative positions of individuals within the herd. I will then sample time points from across the capture period (controlling for weather conditions and time of day) and generate 5, 10, 15, 20, and 25 meter buffers around each buffalo. I will use the buffer zones to calculate number of individuals within each radius and determine the degree of IRA for each individual. I will then compare IRA to disease exposure and infection data collected as part of a larger Foot-and-Mouth Disease Virus study to determine whether there is a relationship between exposure/infection and IRA. Because this is such an extensive dataset, I hope to be able to automate the process of generating buffers around each buffalo at each time point using programming in ArcGIS.

5. Expected Outcome

I want to statistically evaluate IRA for each individual buffalo and produce graphs of average number of neighboring individuals per radius size for each individual. I also hope to statistically evaluate relationships between directly transmitted disease exposure and IRA.

6. Significance

Understanding disease dynamics in social mammals is of fundamental importance in the current context of accelerated infectious disease emergence. Owing to a uniquely tractable study system, this work will be the first to categorize individual variation in spatial behavior and link it to disease risks and transmission dynamics.

This work has implications for predicting and managing animal and human diseases. If key individuals for disease transmission can be identified based on spatial-behavioral traits, efficacy and efficiency of disease control could be optimized via targeted interventions.

7. Level of Preparation

I have moderate experience in GIS and statistical analysis in R. I have completed ST 511 and 512 (Methods of Data Analysis), and for a current side project I am using R to analyzed blood chemistry parameters using linear mixed models. I have taken two GIS courses: Geo 565 (Intro GIS) and Geo 580 (advanced GIS) and have used subsets of my data for projects in both of those courses. However, I definitely am not an expert in either GIS or R and will need help navigating both programs.