Introduction
My research goal is to determine the effects of social and spatial behavior on disease transmission and nasal microbiome similarity. The overarching hypothesis of this project is that composition of microbial communities depends on host spatial and social behavior because microbe dispersal is limited by host movement. My research group studies a herd of African Buffalo in Kruger National Park that is GPS collared and lives in a 900 hectare predator-free enclosure. In conjunction with the GPS collars, each animal has a contact collar that records contact length and ID of any animal that comes within ~1.5 meters. However, for this class I focused on the GPS collars in the effort to test whether they are sufficient for inferring contact networks.
The purpose of this step in my research was to establish a method for creating distance matrices between animals at different time points using only GPS data. I wanted to determine whether this is a viable option for inferring contact networks and could be used in lieu of the contact collars. If this method is effective, my plan was to iterate through multiple time points to determine average distances between animals over time.
Results from this pilot study showed that the method is feasible, however, GPS collars would be less effective for inferring a contact network than the contact collars because temporal resolution is sacrificed.
Data Manipulation in R
The raw data from the buffalo GPS collars is in the form of individual text files, one text file per individual per time point. Step one was to generate a csv file that included all individuals at one time point. This was done in R using the following code:
##Load necessary packages
library(reshape2)
library(ggplot2)
##Set working directory and set up loop to import data
setwd(“C:\\Users\\couchc\\Documents\\GPS_captures_1215\\GPS_capture_12_15”)
SAVE_PATH <- getwd()
BASE_PATH <- getwd()
CSV.list<-list.files(BASE_PATH,pattern=glob2rx(“*.csv”))
all.csv<-read.csv(CSV.list[1], header = TRUE)
names(all.csv)
buffalo<-data.frame(ID=all.csv$title,X=all.csv$LATITUDE,Y=all.csv$LONGITUDE, Day=all.csv$MM.DD.YY, Hour=all.csv$HH.MM.SS, s.Date=all.csv$s.date, s.Hour=all.csv$s.hour)
## Now we want to melt them and cast them based on some data within the frame.
melted.all.csv<-melt(buffalo,id=c(“ID”))
names(melted.all.csv)
casted.all.csv<-dcast(melted.all.csv, LATITUDE+LONGITUDE~id)
# Where our previous data was organized with ID as a column,
# now ID is a row, and we can see what value an ID took at a given location.
length(buffalo[,1])
ggplot(buffalo[buffalo$ID==buffalo$ID[1],], aes(x=X, y=Y, value = ID))+
geom_raster()
Distance Matrix
Because the GPS collars are not synchronized, the 30-minute time intervals can be offset quite a bit, and there are quite a few missing time intervals. To try and correct this problem, I rounded times to the nearest 30 minutes in excel. I then imported the new excel file back into R to generate a “test” distance matrix for a single time point using the following code:
-Show code and output once I’m able to run it on a lab computer (My computer doesn’t have the processing power to do it in a reasonable time).
Conclusion
This method showed some interesting results: inter-individual distances can be calculated in R using GPS collar data, and if the process were iterated over multiple time points, average pairwise distances could be computed between each individual. A mantel test could be used to determine correlation between . The problem with rounding time points to the nearest 30 minutes is that it doesn’t guarantee that the buffalo are actually ever coming within the distance that is given by the matrix. Since some of the collars are offset by up to 15 minutes, the animals could be in the recorded locations at different times, and never actually come into contact with each other. The benefit of using GPS collars to infer social behavior is that it gives us actual distances between individuals, which adds an element of spatial resolution not provided by the contact collars. Contact collars only read when another animal is within a specific distance, but no measure of actual distance is recorded. However, since we are looking for the best way to predict contact transmission of microbes for this pilot project, we are not interested in contact distances past a certain threshold. Although the distance matrix could prove useful for other behavior studies, it provides less relevant information than the contact collars. This could be a useful method if we wanted actual distances between organisms. However, to simplify we would like to set a threshold distance for microbe transmission, which is easily done with the contact collars. However, we would need much higher temporal resolution in our GPS data to be able to build distance matrices that are precise enough to infer contact networks, and this process is more easily done using contact collars.
Overall, this pilot project was useful because it demonstrated some of the potential benefits and drawbacks of using GPS intervals to infer contact networks. Although I do not plan to use the method described in this post for my dissertation research, it may be beneficial in future studies of animal behavior or herd structure. The exercise was very useful for improving my confidence in R — with lots of help from my classmates of course!