Mapmaking: Part 2

No, you didn’t miss Mapmaking: Part 1. Before getting interrupted by last-minute extra fieldwork with the Waitt Foundation (which was awesome!), I gave an intro to photo management in Lightroom. Today I’ll expand on that, beginning a series of posts explaining how I created this map. On the way, I’ll introduce a little bit of…



Some really ugly code that I once wrote.

If you’ve been following my blog just to look at pretty beach pictures, I apologize. But I encourage you to keep reading. If any of the code makes you go cross-eyed, don’t worry; it does the same to me. I would love to field some questions in the comment section to make things clearer.

So. I have all of my photos keyworded to oblivion, and those keywords include sample IDs. How did I get them into my map? First, I needed to make sure I could link a given sample with its photos programmatically. I have a machine-readable metadata table that stores all our sample information, which we’ll be using later for data analysis. Metadata just refers to ‘extra’ information about the samples, and by machine-readable, I mean it’s stored in a format that is easy to parse with code. I used this table to build the map because it specifies GPS coordinates and provides things like the site name to fill in the pop-ups. But I didn’t have any photo filenames in this table, because it’s easier to organize the photos by tagging them with their sample IDs, like I explained last post. I simply needed to extract sample IDs from the photos’ keywords and add the their filenames to my sample metadata table. And not by hand.

Excerpt from sample metadata table
E1.3.Por.loba.1.20140724Lagoon entrance2014072411:23Porites lobata-14.689414145.468137
E1.19.Sym.sp.1.20140724Lagoon entrance2014072411:26Symphyllia sp-14.689414145.468137
E1.6.Acr.sp.1.20140726Trawler2014072610:35Acropora sp-14.683931145.466483
E1.15.Dip.heli.1.20140726Trawler2014072610:38Diploastrea heliopora-14.683931145.466483
E1.3.Por.loba.1.20140726Trawler2014072610:41Porites lobata-14.683931145.466483

A popup from the map on our webpage, displaying the sample ID, selected metadata information, and a photo.

To get started, I installed a Lightroom plugin called LR/Transporter. This plugin contains many functions for programmatically messing with photo metadata. Using it, I created a ‘title’ for all of my photos with a sequence of numbers in the order that they were taken. The first sample photo from the project was one that Katia took while I was working in Australia, and it’s now called ‘GCMP_sample_photo_1’. Katia and I also took 17 other photos that contained this same sample, incrementing up to ‘GCMP_sample_photo_18’. The last photo I have from the project is one from my last trip, to Mo’orea, and it now has the title ‘GCMP_sample_photo_3893’.

Then, I exported small versions of all my photos to a publicly accessible internet server that our lab uses for data. I did this with another Lightroom plugin called FTP Publisher, from the same company that made LR/Transporter. Each photo was uploaded to a specific folder and given a filename based on its new arbitrary title. Thus my first photo, GCMP_sample_photo_1, is now easily located at:

Next, I used LR/Transporter to export a machine-readable file where the first item in every line is the new title of the photo, and the second item is a comma-separated list of all the photo’s keywords, which include sample IDs.

Excerpt from Lightroom photo metadata table
GCMP_sample_photo_1E1.3.Por.loba.1.20140724, Fieldwork, GCMP Sample, ID by Ryan McMinds, Lagoon Entrance, Pacific Ocean
GCMP_sample_photo_2E1.3.Por.loba.1.20140724, Fieldwork, GCMP Sample, ID by Ryan McMinds, Lagoon Entrance, Pacific Ocean, Ryan McMinds
GCMP_sample_photo_12420140807, E1.5.Gal.astr.1.20140807, GCMP Sample, ID by Ryan McMinds, Pacific Ocean, Trawler Reef
GCMP_sample_photo_1051Al Fahal, E4.3.Por.lute.1.20150311, GCMP Sample, ID by Ryan McMinds, KAUST, Red Sea
GCMP_sample_photo_3893E13.Out.Mil.plat.1.20151111, GCMP Sample, Mo'orea

Now comes the fun part.

To associate each sample with a URL for one of its photos, I needed to search for its ID in the photo keywords and retrieve the corresponding photo titles, then paste one of these titles to the end of the server URL. The only way I know to do this automatically is by coding, or maybe in Excel if I were a wizard. I’ve learned how to code almost 100% through Google searches and trial-and-error, so when I write something, it’s a mashing-together of what I’ve learned so far, and it’s made for results, not beauty. The first programming language I learned that was good for parsing tables was AWK, because I do a lot of work in the shell on the Mac terminal. I thus tackled my problem with that language first, in an excellent example of an inefficient method to get results:

while read -r line; do
search=$(awk '{print $1}' <<< $line)
awk -v search=$search 'BEGIN {list=""}
$0 ~ search && list != "" {list = list","$1}
$0 ~ search && list == "" {list = $1}
END {print search"\t"list}' photo-metadata-file.txt
done < sample-metadata-file.txt > output-file.txt


I’ve been issuing my AWK commands from within the shell, which is a completely separate programming language. For the life of me, I couldn’t remember how to use AWK to read two separate files simultaneously while I was writing this code. I know I’ve done it before, but I couldn’t find any old scripts with examples, and rather than re-learn the efficient, correct way, I mashed together commands from two different languages. I then decided I needed to go back and do it the right way, so I rewrote the code entirely in AWK. That code snippet isn’t very long, but it took a lot of re-learning for me to figure it out. So it was about a week or so before I realized that since my map-making had to occur in yet another language (called R), it was ridiculous for me to be messing with AWK in the first place…

So I came to my senses and started over.

In R, I simply import the two tables, like so:

samples <- read.table('sample-metadata-file.txt',header=T,sep='\t',fill=T,quote="\"")
photo_data <- read.table('photo-metadata-file.txt',header=F,sep='\t',quote="\"")

Then use a similar process as in AWK to create a new column of photo titles in the sample metadata table (this time I simply add the first photo instead of the whole list):

samples$photo_name <- as.character(sapply(samples$sample_name, function(x) { photo_data[grep(x,photo_data[,2])[1],1] }))

And now, I have a single table that tells me the coordinates, metadata, and photo titles of each sample. With this, I can make the map, with one point drawn for each line in the table. I’ll continue explaining this process in another post.

Excerpt from sample metadata table
E1.3.Por.loba.1.20140724Lagoon entrance2014072411:23Porites lobata-14.689414145.468137GCMP_sample_photo_1
E1.19.Sym.sp.1.20140724Lagoon entrance2014072411:26Symphyllia sp-14.689414145.468137GCMP_sample_photo_17
E1.6.Acr.sp.1.20140726Trawler2014072610:35Acropora sp-14.683931145.466483GCMP_sample_photo_37
E1.15.Dip.heli.1.20140726Trawler2014072610:38Diploastrea heliopora-14.683931145.466483GCMP_sample_photo_37
E1.3.Por.loba.1.20140726Trawler2014072610:41Porites lobata-14.683931145.466483GCMP_sample_photo_40

By the way, I am working on translating my blog into Spanish and French, to make it more accessible and just to help myself learn. Si quieres ayudarme, puedes encontrar la traducción activa de esta entrada y otras en el sitio Duolingo. ¡Gracias!

Photo management

First off, go play with this interactive map of our sampling locations on our project homepage, because I’ve been working on it for the last week and I’m very proud of it :).

Now, I have a confession to make.

Despite the singular focus of my prior blog posts, my work is not entirely composed of swimming around in the tropics. In fact, most months of the year, you can find me right here, bathing instead in the light of my computer screen.

I’ve been meaning to write more posts while stateside, but the subject matter is a bit more difficult to ‘spice up’. So I’ve put it off. Today, however, I think I’ve got an interesting topic that will begin a new theme of post regarding the most interesting and time-consuming part of my job: computer work.

Since we returned from Reunion a couple of weeks ago, I’ve spent a considerable amount of time preparing the photos and data from our trips so that they are organized, useful, and publicly accessible. So far, the team has collected over 3,000 photos of more than 550 coral samples. Keeping these organized can become very difficult as we progress, so I’ve been working with a variety of tools to make it easier. When we’re in the field, we take tons of photos of each individual coral, from closeups that show small morphological details, to wide-angle photos that we can use later to determine the surroundings of the coral. We also take photos of the reef, photos of each other, and photos of that awesome creature that I’ve never seen before and it’s so close and so colorful and sooo cool and look at it feeding, it’s waving its antennae around and catching things and it’s so awesome!!

Seriously, this mantis shrimp was freaking cool

Seriously, this mantis shrimp was freaking cool

At the end of the day, I have hundreds of photos. Some are pretty, some need post-processing work to become pretty, some are definitely not pretty but can be used as data, and some might be useable as data with some post-processing of their own. Each photo might have one or multiple samples in it, or could be a great example of a particular disease, or maybe just it just has one of us making a funny face. To be useful, I need a way to find these photos again, somewhere in the midst of the 47,000 other photos on my hard drive (seriously).

Ummm... data?

Ummm… data?

The primary tool I use to manage the mess is Adobe Lightroom. Lightroom enables me to process my photos in bulk and add keywords to the photos so I can easily search for them later. When I import all the photos from a particular dive, for instance, I have Lightroom automatically add the GPS coordinates for the dive and keywords for the site name, project, photographer, etc. Then I go through the photos and add keywords to each one that include sample identification codes and everything interesting in the picture, like fish, diseases, or divers. Now, there are two very neat aspects about Lightroom keywords that I take advantage of. The first is that you can establish keyword synonyms so that every time you tag a photo with one word, its synonyms will automatically also be attached. I can tag a photo with ‘lionfish’, and that’s all well and good. But later, I might be thinking all sciency and want to find all my photos with ‘Pterois radiata‘ in them. If I have previously told Lightroom that the scientific name and common name are synonyms, my search will find exactly what I need.

But what if I want to find all photos of fish that belong to Scorpaeniformes (the group that includes both lionfish and stonefish)? The second handy aspect of Lightroom keywords comes in here: they can be placed in a hierarchy. I’ve placed the keyword ‘Pterois radiata‘ within ‘Pterois‘, within ‘Scorpaeniformes’, so every time I tag a photo with the simple term ‘lionfish’, it’s also tagged with its higher-level taxonomic groupings. For our samples, I even put the sample ID keyword within its corresponding species. In fact, I’ve set up an entire taxonomic tree of organism names within my keywords, so every time I tag a simple sample ID, the photo is made searchable with terms corresponding to all the different levels of the tree of life. It’s awwwesommmmeee.

Manual keywords (5): E10.17.Cyp.sera.1.20150628, North Bay, Octopus, Photo by Joe Pollock, GCMP Sample
Resulting keywords (29): Animal, Anthozoan, Australia, Cephalopoda, Cnidaria, Cnidarian, Cyphastrea, Cyphastrea serailia, E10.17.Cyp.sera.1.20150628, GCMP, GCMP Sample, Hard coral, Hexacorallian, Indo-Pacific, LH_282, Lord Howe Island, Merulinidae, Metazoan, Mollusc, North Bay, Octopus, Pacific Ocean, Photo by Joe Pollock, Protostome, Robust, Scleractinian, Stony Coral, XVII, AU

The next stage of photo management for me is post-processing. I am nowhere close to an expert photographer or image editor, but I’m learning. It’s still amazing to me how much a photo can be improved with a couple quick adjustments of exposure and levels. Most of the time, photos seem to come ‘off the camera’ with a washed-out and low-contrast look. Underwater photos always have their colors messed up. When we take photos of samples, we generally put a standard color card and CoralWatch Coral Health Chart in the frame so that we can make the right adjustments later. Fixing the color and exposure doesn’t just make the photos prettier, it can help us to understand the corals. It’s tough to spot patches of disease or the presence of bleaching when the whole photo is various dark shades of green. The best thing about Lightroom (at least compared to Photoshop and a number of other image editing programs)* is the ability to make adjustments in bulk. Often, a particular series of photos were all taken in very similar conditions. Say, all the photos from a single dive, where we were at 30 ft with a particular amount of visibility and cloud cover. I can play around with just one of the photos, getting the adjustments just right, then simply copy those adjustments and paste them to the rest of the photos from the dive. Voila! Hundreds of photos edited.

Before adjustments

After adjustments

Aaaand before

Aaaand before

Aaaannd after

Aaaannd after

Once I’ve got the photos edited and organized, I can do fun things with them, like export them to Flickr for your browsing pleasure, or embed them in the map you explored at the beginning of the post. But explaining that is for another day…

*A note about software. The next-best photo software I’ve used is Google’s free (free!) Picasa. Picasa will also allow you to batch-edit photos, and had facial recognition long before Lightroom. iPhoto also has these features. But as far as I know, the keywording in Picasa and iPhoto doesn’t support hierarchies or synonyms.