HMC Bee Lab: When Ants Get Too Close: A Guide for Splitting Up

Our lab mainly works with ants in a synthetic environment. This environment simulates their natural habitat in the trees, which helps us understand how they might be moving between different nests of a tree. Cameras set up around their enclosure capture their movements at specific areas in their box, like when they cross bridges. This can lead to hours of video that is impractical for humans to manually parse. A software pipeline built with Python and MATLAB’s Motion-Based Multiple Object Tracking Software helps analyze these videos and extract useful data. My role in the lab has been to test and evaluate our Ant Tracking pipeline and suggest ways for improving it.

Currently, I’m working on a problem I call “blob coalescence.” This occurs when ants moving near each other on a bridge are identified by the software as a single, large ant. In a situation like this, the algorithm loses track of one of the ants and then reassigns it a new ID after it leaves the vicinity of its neighbor. This is the equivalent of an ant disappearing in the video and another one suddenly reappearing a couple of seconds later.

Blob coalescence of two ants

Why does this happen and what can be done to fix it? I decided to dive into MATLAB’s object detection to find out. It turns out that there are only a few steps in blob detection in an image:

1. Retrieve a binary mask for the image

2. Perform morphological opening on the image with a small structuring element

3. Perform morphological closing on the image with a larger structuring element

4. Convert holes in the image to foreground

5. Find the coordinates, shape, and size of each blob in the image

The first step in blob detection is the retrieval of a binary mask from the image. A binary mask is a set of labels for each pixel in an image. If a pixel appears in the foreground of the image, it is labeled with the value 1. Otherwise, it gets labeled with a 0. The collection of all such labels is a matrix identifying only the location of moving objects in the image. In our pipeline, ants would then be labeled with a 1 and their surroundings with a 0. I will refrain from going into detail about how you might produce the binary mask in the first place because I don’t think it is important for the problem at hand. Rather, let’s dive into the next steps, which apply morphological operations to the binary mask.

A binary mask (right) for the ant pictured on the left

Once we have retrieved a binary mask of the image, we perform morphological opening to erode then dilate it. An erosion is a way of transforming a binary mask to “smooth in” clusters of foreground pixels. In an erosion, foreground pixels that are on the edge of a cluster of foreground pixels are converted to background (by changing a 1 to a 0). The algorithm for this uses something called a structuring element, a very small binary mask representing the shape and size of an ant as a binary mask. In our pipeline, a small 3x3 square of 1s is used.

In erosion, the center of the structuring element is superimposed over each foreground pixel in our image. Each time this is done, all pixels that overlap with the structuring element are considered. If any of the values of those pixels differ from those in the structuring element with which they’ve been overlaid, our center pixel is classified as background (instead of foreground). Thus, erosion with a structuring element of all 1’s has the effect of keeping only the most central foreground pixels in a cluster.

Erosion with a 3x3 structuring element of all 1’s

After erosion, image dilation is performed to “smooth out” clusters of foreground pixels. Similar to erosion, dilation also superimposes the center pixel of the structuring element over each pixel in the image. But rather than converting foreground pixels to background if any of the overlapping pixels are misclassified, dilation converts background pixels to foreground if any of the overlapping pixels are correctly classified (ie they match with those in the structuring element). With a structuring element of all 1’s this has the effect of expanding a cluster of foreground pixels so that background pixels near the cluster are converted to foreground. In a sense, dilation is the opposite of erosion.

Dilation with a 3x3 structuring element of all 1’s

The process of performing an erosion followed by a dilation is known as morphological opening. An easy way of visualizing a morphological opening is to imagine sliding the structuring element around inside a cluster of foreground pixels. Any foreground pixels that you would be unable to cover using the structuring element will be converted to background.

Morphological opening (light blue) of the dark-blue square by the grey disk

Likewise, morphological closing is the process of performing a dilation followed by an erosion. You can visualize this by sliding the structuring element around outside the cluster of foreground pixels. Any background pixels that you would be unable to cover using the structuring element will be converted to foreground.

Morphological closing (light blue) of the dark blue squares by the grey disk

Our pipeline performs opening with a small structuring element and then closing with a larger structuring element so as to increase the size of the originally detected blobs. The overall effect is to better generalize the shape of the clusters of foreground pixels so that they can be more easily recognized. In essence, we are reducing noise in the initial foreground detections and expanding the size of the foreground shapes.

The last transformation performed on our image is that of filling holes. You can visualize this process by coloring the background of an image starting at its edges. Any pixels that cannot be reached without lifting your pen off the image are considered holes. The pipeline will convert such images to foreground, so that there aren’t any background pixels surrounded by foreground.

Filling Holes

After all transformations are complete, blob analysis is used to extract the coordinates, shape, and size of the foreground regions. This step allows us to discard foreground clusters that are too small or too large to be an ant.

So why does the algorithm merge the blobs of ants that get close to each other? My hypothesis is that a dilation step causes the two foreground clusters to merge and a later attempt at erosion fails to retrieve the background pixels between the two ants. This could be because the structuring element we use when performing morphological closing is too big to “fit” in the gap between the two ants. Another possibility is that at some point, the gap between the two ants becomes surrounded by foreground pixels. This would result in the “filling” of that gap and its conversion to foreground.

Yet another idea might involve altering our structuring element to better represent the shape of the ant (rather than using a simple 3x3 matrix of 1’s). Although this might seem like a promising solution, I’m worried about the ability of the ants to rotate themselves in the video, which the structuring element itself cannot do. In this case, you might think we’d benefit from having a structuring element even smaller than the ant, but the usefulness of that approach is constrained by the resolution of the video, since a single ant is only a few pixels wide in some of our videos. Clearly there isn’t a single solution for all situations, but perhaps these ideas will finally help me solve this simple but exasperating problem!

HMC Bee Lab

Pages

Friday, March 1, 2019

When Ants Get Too Close: A Guide for Splitting Up

No comments:

Post a Comment