HMC Bee Lab: Finding Regions of Interest in Ant Footage: Automating Work with Computer Vision

In the Bee Lab, our overarching goal is to study the behavior of social insects, such as bees and ants, to learn more about how they work together to accomplish the feats they do. As part of this goal, we are studying the nesting behavior of turtle ants, to learn more about their behavior. For example, one of the projects our lab is working on is trying to understand how turtle ants, a species of ant which lives in trees, chooses among available nests in different parts of a tree.

As part of this project, we are manipulating the connections between several turtle ant nests within an arena, to observe how this influences which nests the colony chooses. To monitor the ants' behavior, we currently use Raspberry Pis set up to record footage of the nests. Trying to track all of the ants by having people watch the footage would be very tedious because of the large quantity of videos produced. Every day, our experiments produce 16 hours of video footage. I don't know about you, but I'd prefer to not spend 16 hours per day in front of a computer, looking at videos of ants.

Because we're interested how the ants choose nests, we wish to measure how ants move along paths to each nest, as well as at key decision points. Therefore, we isolate these locations (Regions of Interest) within the video frame and have the computer pipeline process only those regions of the video. However, with the large amounts of video the experiment is making, marking the Regions of Interest (ROIs) manually in every video would be very tedious and take a long time. So I set about trying to automate turning the raw footage of the colony, which looks like this:

Figure 1: A frame of video from one experiment manipulating the connections between ant nests (red tubes in blue bases).

into an automatic detection of the regions of interest, which would resemble this:

Figure 2: The regions of interest, manually labeled

As you can see from the images, we outlined the regions of interest in red on the platform to indicate where the regions of interest are. However, writing the code to go from a video of the arena to the ROIs turned out to be a surprisingly difficult task.

The first thing I had to do was to teach the computer what red is. This task was much harder than I anticipated it being, because the actual values of the pixels in the red regions can vary quite a bit based on the lighting, and the classic RGB color model, which is how videos are recorded and computers process footage, does not reflect how people view colors.

My first task was to represent the image in a way which made it easier for the computer to identify different colors. After doing a lot of research (there are a lot of different color models to choose from), I ended up settling on the HSV Color Model:

Figure 3: The Hue-Saturation-Value color wheel

The main advantages that this model offered was that it has a simple representation, and each of the three components corresponds to an aspect of the color which people can intuitively grasp: the hue encodes what color it is, the value encodes how bright the color is, and the saturation encodes how pure the color is.

However, even with this improved color model, selecting by just pixel values results in an image like this:

Figure 4: The pixels from the video frame which are red

and this has several problems which must be overcome before it can be used. The first problem is that the red polygons we want to draw contain some holes which would prevent the code from recognizing that these objects are continuous polygons. In order to prevent this, I turned to morphological operations. Arya Massarat wrote a post explaining them, a few months ago, which explains both what they are and their use in tracking ants. By using morphological operations to fill in those random holes in the polygons, we could get to this image:

Figure 5: The result of closing holes with morphological operations

The next problem is that there are a lot of other items which the code picked up as being of interest that are not the ROIs which we care about. To remove these, I used contour detection to convert the pixel values to shapes. Once I had the shapes, I can delete the shapes which are completely filled in, and the extraneous noise gets removed:

Figure 6: The contours detected from the above mask

However, there is still one remaining problem: the shapes do not match the original polygon shapes very closely. To solve this, I used morphological operations to produce a skeleton of the image, and then I computed the convex hull for the skeleton. Shown below is the output of taking the skeleton of the image, which I can then combine with the above contours to improve the accuracy of the shapes.

Figure 7: The skeleton of the mask from above

Next, to clean it up a little bit more, the skeletons get further simplified by an approximate polygonal fitting algorithm which simplifies skeleton into a simpler polygon with sides which (hopefully!) match the sides of the original polygon.

Lastly, the approximated polygons get converted into a convex hull to reduce inaccuracies in the above algorithm, which sometimes introduces unwanted indents into the polygons. OpenCV has a function which is supposed to convert polygons to their convex hulls. However, the algorithm they implement was proven to not work for all polygons, so I had to search the literature for an algorithm which works, and then I had to write my own code to implement it.

And here's the final result of the algorithm:

Figure 8: The final detected polygons

which, when overlaid onto the original image, gets us this result:

Figure 9: The final image, with regions of interest automatically labeled

And now that we have the red polygons automatically detected in the code, we can use them to describe the tracks of ants as they move through the image. For studying the behavior of ants, it is useful to know which decision the ants make when they reach key decision points, and it is useful to know which direction the ants are crossing while near nests. By comparing the positions of the ants to the polygons, we can automatically compute what decisions the ants make and which nests they are entering or leaving, allowing the computer to process the videos without any help from people.

Image Credits:

[1]: This image is a frame taken from a video recording of an experimental setup in the lab.
[2,4,5,6,7,8,9]: These images were made by using pieces of the pipeline to process figure 1.
[3]: Image from Wikipedia user SharkD (source, CC BY-SA 3.0 license)

HMC Bee Lab

Pages

Friday, July 26, 2019

Finding Regions of Interest in Ant Footage: Automating Work with Computer Vision

Further Reading:

Image Credits:

No comments:

Post a Comment