No, you didn’t miss Mapmaking: Part 1. Before getting interrupted by last-minute extra fieldwork with the Waitt Foundation (which was awesome!), I gave an intro to photo management in Lightroom. Today I’ll expand on that, beginning a series of posts explaining how I created this map. On the way, I’ll introduce a little bit of…
*shudder*
coding.
If you’ve been following my blog just to look at pretty beach pictures, I apologize. But I encourage you to keep reading. If any of the code makes you go cross-eyed, don’t worry; it does the same to me. I would love to field some questions in the comment section to make things clearer.
So. I have all of my photos keyworded to oblivion, and those keywords include sample IDs. How did I get them into my map? First, I needed to make sure I could link a given sample with its photos programmatically. I have a machine-readable metadata table that stores all our sample information, which we’ll be using later for data analysis. Metadata just refers to ‘extra’ information about the samples, and by machine-readable, I mean it’s stored in a format that is easy to parse with code. I used this table to build the map because it specifies GPS coordinates and provides things like the site name to fill in the pop-ups. But I didn’t have any photo filenames in this table, because it’s easier to organize the photos by tagging them with their sample IDs, like I explained last post. I simply needed to extract sample IDs from the photos’ keywords and add the their filenames to my sample metadata table. And not by hand.
Excerpt from sample metadata tablesample_name | reef_name | date | time | genus_species | latitude | longitude |
---|---|---|---|---|---|---|
E1.3.Por.loba.1.20140724 | Lagoon entrance | 20140724 | 11:23 | Porites lobata | -14.689414 | 145.468137 |
E1.19.Sym.sp.1.20140724 | Lagoon entrance | 20140724 | 11:26 | Symphyllia sp | -14.689414 | 145.468137 |
E1.6.Acr.sp.1.20140726 | Trawler | 20140726 | 10:35 | Acropora sp | -14.683931 | 145.466483 |
E1.15.Dip.heli.1.20140726 | Trawler | 20140726 | 10:38 | Diploastrea heliopora | -14.683931 | 145.466483 |
E1.3.Por.loba.1.20140726 | Trawler | 20140726 | 10:41 | Porites lobata | -14.683931 | 145.466483 |
To get started, I installed a Lightroom plugin called LR/Transporter. This plugin contains many functions for programmatically messing with photo metadata. Using it, I created a ‘title’ for all of my photos with a sequence of numbers in the order that they were taken. The first sample photo from the project was one that Katia took while I was working in Australia, and it’s now called ‘GCMP_sample_photo_1’. Katia and I also took 17 other photos that contained this same sample, incrementing up to ‘GCMP_sample_photo_18’. The last photo I have from the project is one from my last trip, to Mo’orea, and it now has the title ‘GCMP_sample_photo_3893’.
Then, I exported small versions of all my photos to a publicly accessible internet server that our lab uses for data. I did this with another Lightroom plugin called FTP Publisher, from the same company that made LR/Transporter. Each photo was uploaded to a specific folder and given a filename based on its new arbitrary title. Thus my first photo, GCMP_sample_photo_1, is now easily located at:
http://files.cgrb.oregonstate.edu/Thurber_Lab/GCMP/photos/sample_photos/processed/small/GCMP_sample_photo_1.jpg
Next, I used LR/Transporter to export a machine-readable file where the first item in every line is the new title of the photo, and the second item is a comma-separated list of all the photo’s keywords, which include sample IDs.
Excerpt from Lightroom photo metadata tableGCMP_sample_photo_1 | E1.3.Por.loba.1.20140724, Fieldwork, GCMP Sample, ID by Ryan McMinds, Lagoon Entrance, Pacific Ocean |
GCMP_sample_photo_2 | E1.3.Por.loba.1.20140724, Fieldwork, GCMP Sample, ID by Ryan McMinds, Lagoon Entrance, Pacific Ocean, Ryan McMinds |
GCMP_sample_photo_124 | 20140807, E1.5.Gal.astr.1.20140807, GCMP Sample, ID by Ryan McMinds, Pacific Ocean, Trawler Reef |
GCMP_sample_photo_1051 | Al Fahal, E4.3.Por.lute.1.20150311, GCMP Sample, ID by Ryan McMinds, KAUST, Red Sea |
GCMP_sample_photo_3893 | E13.Out.Mil.plat.1.20151111, GCMP Sample, Mo'orea |
Now comes the fun part.
To associate each sample with a URL for one of its photos, I needed to search for its ID in the photo keywords and retrieve the corresponding photo titles, then paste one of these titles to the end of the server URL. The only way I know to do this automatically is by coding, or maybe in Excel if I were a wizard. I’ve learned how to code almost 100% through Google searches and trial-and-error, so when I write something, it’s a mashing-together of what I’ve learned so far, and it’s made for results, not beauty. The first programming language I learned that was good for parsing tables was AWK, because I do a lot of work in the shell on the Mac terminal. I thus tackled my problem with that language first, in an excellent example of an inefficient method to get results:
while read -r line; do search=$(awk '{print $1}' <<< $line) awk -v search=$search 'BEGIN {list=""} $0 ~ search && list != "" {list = list","$1} $0 ~ search && list == "" {list = $1} END {print search"\t"list}' photo-metadata-file.txt done < sample-metadata-file.txt > output-file.txt
Ew.
I’ve been issuing my AWK commands from within the shell, which is a completely separate programming language. For the life of me, I couldn’t remember how to use AWK to read two separate files simultaneously while I was writing this code. I know I’ve done it before, but I couldn’t find any old scripts with examples, and rather than re-learn the efficient, correct way, I mashed together commands from two different languages. I then decided I needed to go back and do it the right way, so I rewrote the code entirely in AWK. That code snippet isn’t very long, but it took a lot of re-learning for me to figure it out. So it was about a week or so before I realized that since my map-making had to occur in yet another language (called R), it was ridiculous for me to be messing with AWK in the first place…
So I came to my senses and started over.
In R, I simply import the two tables, like so:
samples <- read.table('sample-metadata-file.txt',header=T,sep='\t',fill=T,quote="\"") photo_data <- read.table('photo-metadata-file.txt',header=F,sep='\t',quote="\"")
Then use a similar process as in AWK to create a new column of photo titles in the sample metadata table (this time I simply add the first photo instead of the whole list):
samples$photo_name <- as.character(sapply(samples$sample_name, function(x) { photo_data[grep(x,photo_data[,2])[1],1] }))
And now, I have a single table that tells me the coordinates, metadata, and photo titles of each sample. With this, I can make the map, with one point drawn for each line in the table. I’ll continue explaining this process in another post.
Excerpt from sample metadata tablesample_name | reef_name | date | time | genus_species | latitude | longitude | photo_title |
---|---|---|---|---|---|---|---|
E1.3.Por.loba.1.20140724 | Lagoon entrance | 20140724 | 11:23 | Porites lobata | -14.689414 | 145.468137 | GCMP_sample_photo_1 |
E1.19.Sym.sp.1.20140724 | Lagoon entrance | 20140724 | 11:26 | Symphyllia sp | -14.689414 | 145.468137 | GCMP_sample_photo_17 |
E1.6.Acr.sp.1.20140726 | Trawler | 20140726 | 10:35 | Acropora sp | -14.683931 | 145.466483 | GCMP_sample_photo_37 |
E1.15.Dip.heli.1.20140726 | Trawler | 20140726 | 10:38 | Diploastrea heliopora | -14.683931 | 145.466483 | GCMP_sample_photo_37 |
E1.3.Por.loba.1.20140726 | Trawler | 20140726 | 10:41 | Porites lobata | -14.683931 | 145.466483 | GCMP_sample_photo_40 |
By the way, I am working on translating my blog into Spanish and French, to make it more accessible and just to help myself learn. Si quieres ayudarme, puedes encontrar la traducción activa de esta entrada y otras en el sitio Duolingo. ¡Gracias!