Demystifying the algorithm

By Clara Bird, Masters Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Hi everyone! My name is Clara Bird and I am the newest graduate student in the GEMM lab. For my master’s thesis I will be using drone footage of gray whales to study their foraging ecology. I promise to talk about how cool gray whales in a following blog post, but for my first effort I am choosing to write about something that I have wanted to explain for a while: algorithms. As part of previous research projects, I developed a few semi-automated image analysis algorithms and I have always struggled with that jargon-filled phrase. I remember being intimidated by the term algorithm and thinking that I would never be able to develop one. So, for my first blog I thought that I would break down what goes into image analysis algorithms and demystify a term that is often thrown around but not well explained.

What is an algorithm?

The dictionary broadly defines an algorithm as “a step-by-step procedure for solving a problem or accomplishing some end” (Merriam-Webster). Imagine an algorithm as a flow chart (Fig. 1), where each step is some process that is applied to the input(s) to get the desired output. In image analysis the output is usually isolated sections of the image that represent a specific feature; for example, isolating and counting the number of penguins in an image. Algorithm development involves figuring out which processes to use in order to consistently get desired results. I have conducted image analysis previously and these processes typically involve figuring out how to find a certain cutoff value. But, before I go too far down that road, let’s break down an image and the characteristics that are important for image analysis.

Figure 1. An example of a basic algorithm flow chart. There are two inputs: variables A and B. The process is the calculation of the mean of the two variables.

What is an image?

Think of an image as a spread sheet, where each cell is a pixel and each pixel is assigned a value (Fig. 2). Each value is associated with a color and when the sheet is zoomed out and viewed as a whole, the image comes together.  In color imagery, which is also referred to as RGB, each pixel is associated with the values of the three color bands (red, green, and blue) that make up that color. In a thermal image, each pixel’s value is a temperature value. Thinking about an image as a grid of values is helpful to understand the challenge of translating the larger patterns we see into something the computer can interpret. In image analysis this process can involve using the values of the pixels themselves or the relationships between the values of neighboring pixels.

Figure 2. A diagram illustrating how pixels make up an image. Each pixel is a grid cell associated with certain values. Image Source: https://web.stanford.edu/class/cs101/image-1-introduction.html

Our brains take in the whole picture at once and we are good at identifying the objects and patterns in an image. Take Figure 3 for example: an astute human eye and brain can isolate and identify all the different markings and scars on the fluke. Yet, this process would be very time consuming. The trick to building an algorithm to conduct this work is figuring out what processes or tools are needed to get a computer to recognize what is marking and what is not. This iterative process is the algorithm development.

Figure 3. Photo ID image of a gray whale fluke.

Development

An image analysis algorithm will typically involve some sort of thresholding. Thresholds are used to classify an image into groups of pixels that represent different characteristics. A threshold could be applied to the image in Figure 3 to separate the white color of the markings on the fluke from the darker colors in the rest of the image. However, this is an oversimplification, because while it would be pretty simple to examine the pixel values of this image and pick a threshold by hand, this threshold would not be applicable to other images. If a whale in another image is a lighter color or the image is brighter, the pixel values would be different enough from those in the previous image for the threshold to inaccurately classify the image. This problem is why a lot of image analysis algorithm development involves creating parameterized processes that can calculate the appropriate threshold for each image.

One successful method used to determine thresholds in images is to first calculate the frequency of color in each image, and then apply the appropriate threshold. Fletcher et al. (2009) developed a semiautomated algorithm to detect scars in seagrass beds from aerial imagery by applying an equation to a histogram of the values in each image to calculate the threshold. A histogram is a plot of the frequency of values binned into groups (Fig. 4). Essentially, it shows how many times each value appears in an image. This information can be used to define breaks between groups of values. If the image of the fluke were transformed to a gray scale, then the values of the marking pixels would be grouped around the value for white and the other pixels would group closer to black, similar to what is shown in Figure 4. An equation can be written that takes this frequency information and calculates where the break is between the groups. Since this method calculates an individualized threshold for each image, it’s a more reliable method for image analysis. Other characteristics could also be used to further filter the image, such as shape or area.

However, that approach is not the only way to make an algorithm applicable to different images; semi-automation can also be helpful. Semi-automation involves some kind of user input. After uploading the image for analysis, the user could also provide the threshold, or the user could crop the image so that only the important components were maintained. Keeping with the fluke example, the user could crop the image so that it was only of the fluke. This would help reduce the variety of colors in the image and make it easier to distinguish between dark whale and light marking.

Figure 4. Example histogram of pixel values. Source: Moallem et al. 2012

Why algorithms are important

Algorithms are helpful because they make our lives easier. While it would be possible for an analyst to identify and digitize each individual marking from a picture of a gray whale, it would be extremely time consuming and tedious. Image analysis algorithms significantly reduce the time it takes to process imagery. A semi-automated algorithm that I developed to count penguins from still drone imagery can count all the penguins on a one km2 island in about 30 minutes, while it took me 24 long hours to count them by hand (Bird et al. in prep). Furthermore, the process can be repeated with different imagery and analysts as part of a time series without bias because the algorithm eliminates human error introduced by different analysts.

Whether it’s a simple combination of a few processes or a complex series of equations, creating an algorithm requires breaking down a task to its most basic components. Development involves translating those components step by step into an automated process, which after many trials and errors, achieves the desired result. My first algorithm project took two years of revising, improving, and countless trials and errors.  So, whether creating an algorithm or working to understand one, don’t let the jargon nor the endless trials and errors stop you. Like most things in life, the key is to have patience and take it one step at a time.

References

Bird, C. N., Johnston, D.W., Dale, J. (in prep). Automated counting of Adelie penguins (Pygoscelis adeliae) on Avian and Torgersen Island off the Western Antarctic Peninsula using Thermal and Multispectral Imagery. Manuscript in preparation

Fletcher, R. S., Pulich, W. ‡, & Hardegree, B. (2009). A Semiautomated Approach for Monitoring Landscape Changes in Texas Seagrass Beds from Aerial Photography. https://doi.org/10.2112/07-0882.1

Moallem, Payman & Razmjooy, Navid. (2012). Optimal Threshold Computing in Automatic Image Thresholding using Adaptive Particle Swarm Optimization. Journal of Applied Research and Technology. 703.

Zooming in: A closer look at bottlenose dolphin distribution patterns off of San Diego, CA

By: Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

Data analysis is often about parsing down data into manageable subsets. My project, which spans 34 years and six study sites along the California coast, requires significant data wrangling before full analysis. As part of a data analysis trial, I first refined my dataset to only the San Diego survey location. I chose this dataset for its standardization and large sample size; the bulk of my sightings, over 4,000 of the 6,136, are from the San Diego survey site where the transect methods were highly standardized. In the next step, I selected explanatory variable datasets that covered the sighting data at similar spatial and temporal resolutions. This small endeavor in analyzing my data was the first big leap into understanding what questions are feasible in terms of variable selection and analysis methods. I developed four major hypotheses for this San Diego site.

The study species: common bottlenose dolphin (Tursiops truncatus) seen along the California coastline in 2015. Image source: Alexa Kownacki.

Hypotheses:

H1: I predict that bottlenose dolphin sightings along the San Diego transect throughout the years 1981-2015 exhibit clustered distribution patterns as a result of the patchy distributions of both the species’ preferred habitats, as well as the social nature of bottlenose dolphins.

H2: I predict there would be higher densities of bottlenose dolphin at higher latitudes spanning 1981-2015 due to prey distributions shifting northward and less human activities in the northerly sections of the transect.

H3: I predict that during warm (positive) El Niño Southern Oscillation (ENSO) months, the dolphin sightings in San Diego would be distributed more northerly, predominantly with prey aggregations historically shifting northward into cooler waters, due to (secondarily) increasing sea surface temperatures.

H4: I predict that along the San Diego coastline, bottlenose dolphin sightings are clustered within two kilometers of the six major lagoons, with no specific preference for any lagoon, because the murky, nutrient-rich waters in the estuarine environments are ideal for prey protection and known for their higher densities of schooling fishes.

Data Description:

The common bottlenose dolphin (Tursiops truncatus) sighting data spans 1981-2015 with a few gap years. Sightings cover all months, but not in all years sampled. The same transect in San Diego was surveyed in a small, rigid-hulled inflatable boat with approximately a two-kilometer observation area (one kilometer surveyed 90 degrees to starboard and port of the bow).

I wanted to see if there were changes in dolphin distribution by latitude and, if so, whether those changes had a relationship to ENSO cycles and/or distances to lagoons. For ENSO data, I used the NOAA database that provides positive, neutral, and negative indices (1, 0, and -1, respectively) by each month of each year. I matched these ENSO data to my month-date information of dolphin sighting data. Distance from each lagoon was calculated for each sighting.

Figure 1. Map representing the San Diego transect, represented with a light blue line inside of a one-kilometer buffered “sighting zone” in pale yellow. The dark pink shapes are dolphin sightings from 1981-2015, although some are stacked on each other and cannot be differentiated. The lagoons, ranging in size, are color-coded. The transect line runs from the breakwaters of Mission Bay, CA to Oceanside Harbor, CA.

Results: 

H1: True, dolphins are clustered and do not have a uniform distribution across this area. Spatial analysis indicated a less than a 1% likelihood that this clustered pattern could be the result of random chance (Fig. 1, z-score = -127.16, p-value < 0.0001). It is well-known that schooling fishes have a patchy distribution, which could influence the clustered distribution of their dolphin predators. In addition, bottlenose dolphins are highly social and although pods change in composition of individuals, the dolphins do usually transit, feed, and socialize in small groups.

Figure 2. Summary from the Average Nearest Neighbor calculation in ArcMap 10.6 displaying that bottlenose dolphin sightings in San Diego are highly clustered. When the z-score, which corresponds to different colors on the graphic above, is strongly negative (< -2.58), in this case dark blue, it indicates clustering. Because the p-value is very small, in this case, much less than 0.01, these results of clustering are strongly significant.

H2: False, dolphins do not occur at higher densities in the higher latitudes of the San Diego study site. The sightings are more clumped towards the lower latitudes overall (p < 2e-16), possibly due to habitat preference. The sightings are closer to beaches with higher human densities and human-related activities near Mission Bay, CA. It should be noted, that just north of the San Diego transect is the Camp Pendleton Marine Base, which conducts frequent military exercises and could deter animals.

Figure 3. Histogram comparing the latitudes with the frequency of dolphin sightings in San Diego, CA. The x-axis represents the latitudinal difference from the most northern part of the transect to each dolphin sighting. Therefore, a small difference would translate to a sighting being in the northern transect areas whereas large differences would translate to sightings being more southerly. This could be read from left to right as most northern to most southern. The y-axis represents the frequency of which those differences are seen, that is, the number of sightings with that amount of latitudinal difference, or essentially location on the transect line. Therefore, you can see there is a peak in the number of sightings towards the southern part of the transect line.

H3: False, during warm (positive) El Niño Southern Oscillation (ENSO) months, the dolphin sightings in San Diego were more southerly. In colder (negative) ENSO months, the dolphins were more northerly. The differences between sighting latitude and ENSO index was significant (p<0.005). Post-hoc analysis indicates that the north-south distribution of dolphin sightings was different during each ENSO state.

Figure 4. Boxplot visualizing distributions of dolphin sightings latitudinal differences and ENSO index, with -1,0, and 1 representing cold, neutral, and warm years, respectively.

H4: True, dolphins are clustered around particular lagoons. Figure 5 illustrates how dolphin sightings nearest to Lagoon 6 (the San Dieguito Lagoon) are always within 0.03 decimal degrees. Because of how these data are formatted, decimal degrees is the easiest way to measure change in distance (in this case, the difference in latitude). In comparison, dolphins at Lagoon 5 (Los Penasquitos Lagoon) are distributed across distances, with the most sightings further from the lagoon.

Figure 5. Bar plot displaying the different distances from dolphin sighting location to the nearest lagoon in San Diego in decimal degrees. Note: Lagoon 4 is south of the study site and therefore was never the nearest lagoon.

I found a significant difference between distance to nearest lagoon in different ENSO index categories (p < 2.55e-9): there is a significant difference in distance to nearest lagoon between neutral and negative values and positive and neutral years. Therefore, I hypothesize that in neutral ENSO months compared to positive and negative ENSO months, prey distributions are changing. This is one possible hypothesis for the significant difference in lagoon preference based on the monthly ENSO index. Using a violin plot (Fig. 6), it appears that Lagoon 5, Los Penasquitos Lagoon, has the widest variation of sighting distances in all ENSO index conditions. In neutral years, Lagoon 0, the Buena Vista Lagoon has multiple sightings, when in positive and negative years it had either no sightings or a single sighting. The Buena Vista Lagoon is the most northerly lagoon, which may indicate that in neutral ENSO months, dolphin pods are more northerly in their distribution.

Figure 6. Violin plot illustrating the distance from lagoons of dolphin sightings under different ENSO conditions. There are three major groups based on ENSO index: “-1” representing cold years, “0” representing neutral years, and “1” representing warm years. On the x-axis are lagoon IDs and on the y-axis is the distance to the nearest lagoon in decimal degrees. The wider the shapes, the more sightings, therefore Lagoon 6 has many sightings within a very small distance compared to Lagoon 5 where sightings are widely dispersed at greater distances.

 

Bottlenose dolphins foraging in a small group along the California coast in 2015. Image source: Alexa Kownacki.

Takeaways to science and management: 

Bottlenose dolphins have a clustered distribution which seems to be related to ENSO monthly indices, and likely, their social structures. From these data, neutral ENSO months appear to have something different happening compared to positive and negative months, that is impacting the sighting distributions of bottlenose dolphins off the San Diego coastline. More research needs to be conducted to determine what is different about neutral months and how this may impact this dolphin population. On a finer scale, the six lagoons in San Diego appear to have a spatial relationship with dolphin sightings. These lagoons may provide critical habitat for bottlenose dolphins and/or for their preferred prey either by protecting the animals or by providing nutrients. Different lagoons may have different spans of impact, that is, some lagoons may have wider outflows that create larger nutrient plumes.

Other than the Marine Mammal Protection Act and small protected zones, there are no safeguards in place for these dolphins, whose population hovers around 500 individuals. Therefore, specific coastal areas surrounding lagoons that are more vulnerable to habitat loss, habitat degradation, and/or are more frequented by dolphins, may want greater protection added at a local, state, or federal level. For example, the Batiquitos and San Dieguito Lagoons already contain some Marine Conservation Areas with No-Take Zones within their reach. The city of San Diego and the state of California need better ways to assess the coastlines in their jurisdictions and how protecting the marine, estuarine, and terrestrial environments near and encompassing the coastlines impacts the greater ecosystem.

This dive into my data was an excellent lesson in spatial scaling with regards to parsing down my data to a single study site and in matching my existing data sets to other data that could help answer my hypotheses. Originally, I underestimated the robustness of my data. At first, I hesitated when considering reducing the dolphin sighting data to only include San Diego because I was concerned that I would not be able to do the statistical analyses. However, these concerns were unfounded. My results are strongly significant and provide great insight into my questions about my data. Now, I can further apply these preliminary results and explore both finer and broader scale resolutions, such as using the more precise ENSO index values and finding ways to compare offshore bottlenose dolphin sighting distributions.

What REALLY is a Wildlife Biologist?

By Alexa Kownacki, Ph.D. Student, OSU Department of Fisheries and Wildlife, Geospatial Ecology of Marine Megafauna Lab

The first lecture slide. Source: Lecture1_Population Dynamics_Lou Botsford

This was the very first lecture slide in my population dynamics course at UC Davis. Population dynamics was infamous in our department for being an ultimate rite of passage due to its notoriously challenging curriculum. So, when Professor Lou Botsford pointed to his slide, all 120 of us Wildlife, Fish, and Conservation Biology majors, didn’t know how to react. Finally, he announced, “This [pointing to the slide] is all of you”. The class laughed. Lou smirked. Lou knew.

Lou knew that there is more truth to this meme than words could express. I can’t tell you how many times friends and acquaintances have asked me if I was going to be a park ranger. Incredibly, not all—or even most—wildlife biologists are park rangers. I’m sure that at one point, my parents had hoped I’d be holding a tiger cub as part of a conservation project—that has never happened. Society may think that all wildlife biologists want to walk in the footsteps of the famous Steven Irwin and say thinks like “Crikey!”—but I can’t remember the last time I uttered that exclamation with the exception of doing a Steve Irwin impression. Hollywood may think we hug trees—and, don’t get me wrong, I love a good tie-dyed shirt—but most of us believe in the principles of conservation and wise-use A.K.A. we know that some trees must be cut down to support our needs. Helicoptering into a remote location to dart and take samples from wild bear populations…HA. Good one. I tell myself this is what I do sometimes, and then the chopper crashes and I wake up from my dream. But, actually, a scientist staring at a computer with stacks of papers spread across every surface, is me and almost every wildlife biologist that I know.

The “dry lab” on the R/V Nathaniel B. Palmer en route to Antarctica. This room full of technology is where the majority of the science takes place. Drake Passage, International Waters in August 2015. Source: Alexa Kownacki

There is an illusion that wildlife biologists are constantly in the field doing all the cool, science-y, outdoors-y things while being followed by a National Geographic photojournalist. Well, let me break it to you, we’re not. Yes, we do have some incredible opportunities. For example, I happen to know that one lab member (eh-hem, Todd), has gotten up close and personal with wild polar bear cubs in the Arctic, and that all of us have taken part in some work that is worthy of a cover image on NatGeo. We love that stuff. For many of us, it’s those few, memorable moments when we are out in the field, wearing pants that we haven’t washed in days, and we finally see our study species AND gather the necessary data, that the stars align. Those are the shining lights in a dark sea of papers, grant-writing, teaching, data management, data analysis, and coding. I’m not saying that we don’t find our desk work enjoyable; we jump for joy when our R script finally runs and we do a little dance when our paper is accepted and we definitely shed a tear of relief when funding comes through (or maybe that’s just me).

A picturesque moment of being a wildlife biologist: Alexa and her coworker, Jim, surveying migrating gray whales. Piedras Blancas Light Station, San Simeon, CA in May 2017. Source: Alexa Kownacki.

What I’m trying to get at is that we accepted our fates as the “scientists in front of computers surrounded by papers” long ago and we embrace it. It’s been almost five years since I was a senior in undergrad and saw this meme for the first time. Five years ago, I wanted to be that scientist surrounded by papers, because I knew that’s where the difference is made. Most people have heard the quote by Mahatma Gandhi, “Be the change that you wish to see in the world.” In my mind, it is that scientist combing through relevant, peer-reviewed scientific papers while writing a compelling and well-researched article, that has the potential to make positive changes. For me, that scientist at the desk is being the change that he/she wish to see in the world.

Scientists aboard the R/V Nathaniel B. Palmer using the time in between net tows to draft papers and analyze data…note the facial expressions. Antarctic Peninsula in August 2015. Source: Alexa Kownacki.

One of my favorite people to colloquially reference in the wildlife biology field is Milton Love, a research biologist at the University of California Santa Barbara, because he tells it how it is. In his oh-so-true-it-hurts website, he has a page titled, “So You Want To Be A Marine Biologist?” that highlights what he refers to as, “Three really, really bad reasons to want to be a marine biologist” and “Two really, really good reasons to want to be a marine biologist”. I HIGHLY suggest you read them verbatim on his site, whether you think you want to be a marine biologist or not because they’re downright hilarious. However, I will paraphrase if you just can’t be bothered to open up a new tab and go down a laugh-filled wormhole.

Really, Really Bad Reasons to Want to be a Marine Biologist:

  1. To talk to dolphins. Hint: They don’t want to talk to you…and you probably like your face.
  2. You like Jacques Cousteau. Hint: I like cheese…doesn’t mean I want to be cheese.
  3. Hint: Lack thereof.

Really, Really Good Reasons to Want to be a Marine Biologist:

  1. Work attire/attitude. Hint: Dress for the job you want finally translates to board shorts and tank tops.
  2. You like it. *BINGO*

Alexa with colleagues showing the “cool” part of the job is working the zooplankton net tows. This DOES have required attire: steel-toed boots, hard hat, and float coat. R/V Nathaniel B. Palmer, Antarctic Peninsula in August 2015. Source: Alexa Kownacki.

In summary, as wildlife or marine biologists we’ve taken a vow of poverty, and in doing so, we’ve committed ourselves to fulfilling lives with incredible experiences and being the change we wish to see in the world. To those of you who want to pursue a career in wildlife or marine biology—even after reading this—then do it. And to those who don’t, hopefully you have a better understanding of why wearing jeans is our version of “business formal”.

A fieldwork version of a lab meeting with Leigh Torres, Tom Calvanese (Field Station Manager), Florence Sullivan, and Leila Lemos. Port Orford, OR in August 2017. Source: Alexa Kownacki.