HMC Bee Lab: Template Matching: Automating “Where’s Waldo?”

Why does the Bee Lab care about template matching?

We have been interested in how ants pick their nests, so we designed the arena pictured in Figure 1. Ants were allowed to climb up onto the platforms that we constructed, and could choose to nest in one of the 8 new nests (the blue and red structures). We then took hundreds of hours of video of the ants to observe their behavior. In particular, we want to observe if the ants moved in and out of the nests and if they used either of the “shortcut” bridges (#2 and #3). We marked these “regions-of-interest” (ROIs) in red. In an attempt to avoid having to watch the countless hours of ant videos, we also started developing a machine-learning program to watch the videos for us. This program works by first identifying the ROIs. It then will identify ants and track their movement through the ROIs. Currently, our software is very good at picking up the ants’ movement, but it also identifies many non-existent walks. On average, there are around 2-6 false positives for every real walk in a given video. We have found that we are getting these false positives for two reasons: 1) bad ROI detection, 2) inaccurate ant detection. To address the ROI detection problem, we have started to explore the possibility of detecting ROIs using template matching.

Figure 1. Example output of ROI detection where the black polygons are the detected ROIs

What is template matching?

Suppose we have a “template” image and a larger “comparison” image as shown below.

Figure 2a. Template image Figure 2b. Comparison image

The process of finding where our template image is located within the comparison image is called template matching.

That’s great, but how do we locate the template image?

Great question! There are two main ways in which we can implement template matching: the template-based approach and the feature-based approach.

Template-Based Approach

This is the “brute force” approach where we will overlay our template image on top of our comparison image and give them a similarity score. There are many ways to create the similarity score, but for now, we can consider it to be how similar each pixel in the template is to the corresponding comparison pixel. Then, we can then iteratively slide the template image over every other possible location on the comparison image and give every location a score. We can store all of our similarity scores as a new grey-scale image - or 2D array - where a pixel’s location indicates where the template image was located on the comparison image. We can get away with only storing the location of the top-left pixel of the template image because the template image never changes. Finally, the pixel with the highest similarity score tells us where the best match between both images is located. However, the template-based approach is not perfect; this approach has trouble identifying templates that have been rotated, resized, or have different lighting conditions.

Figure 3a. Illustration of iterative comparison Figure 3b. Example of what a similarity image may look like. The bright pixel indicates where the top-left corner of the template image should be placed to get the highest similarity score.

Feature-Based Approach

The feature-based approach is the most common approach to template matching. It is more complicated than the template-based approach but is known to get better results in a wider variety of template and comparison images. This approach attempts to find key features - i.e. shapes, colors, patterns, etc. - from the template and comparison images, often using algorithms such as SIFT or SURF. The program will then find the location of the template image by matching up the features of the template image with the corresponding features on the comparison image. For example, the neural network in Figure 4 detected many of the man’s facial features. Without the source code, it is impossible for us to know exactly how it found the features, but we can guess that his tongue was detected as a group of pink pixels. Lastly, the program then overlaid the two images to match the features as closely as possible. In this case, matching the images involved rotating and resizing the template. This approach has the added benefit of being able to match templates that have different lighting conditions since it is not comparing pixel brightness values.

Figure 4. Demonstration of the feature-based approach

How would template matching help us identify ROIs?

Template matching has the potential to improve our ROI detection software in two main ways:

First, template matching would allow us to detect ROIs’ locations and exact shapes more accurately; something that is difficult to achieve with the current ROI detection method. This is especially important because if our detected ROI is just a few pixels off, we may start to detect ant movement on the ground and not the platform, ie. ROI #7 from Figure 5a. With template matching, we could essentially take a screenshot from one video and have our program find the location of that screenshot in many different video clips. Assuming that the program correctly identifies the location of our screenshot, we will have an ROI that is also the exact shape of the ROI that we drew on the arena. Additionally, we would be able to control which ROIs the program attempts to find, allowing us to avoid false ROI detection such as ROI #2 in Figure 5b.

Second, template matching would allow us to number our ROIs and their edges more consistently. Currently, both the ROIs and their sides get numbered in the order that our software identifies them. Figure 5 demonstrates how the central pentagon gets a different number in different videos of the same colony because different ROIs were identified. So in order to tell which nests the ants are using, we need to manually look at the output of the ROI detection and record which real ROI corresponds to every detected ROI and its sides by hand for every video we recorded. If we could set up the experiment such that the ROIs and their sides are already labeled, we could avoid the need to go back to the labeled image and record all of the data manually.

Figure 5a. Figure 5b.

Output of our program’s ROI detection for two videos from the same angle. Notice that the ROI numbering is not consistent across the two and that ROI #3 in Figure 5b will track movement on the ground.

Overall, template matching may have potential to save us countless hours of watching ant videos by improving our ROI detection. Currently, we get too many bad ROIs due to incorrectly identifying shapes and detecting non-existent ROIs. This makes it difficult to tell how much of our data is useful, making all of our software’s data nearly unusable. We hope that template matching will give us the accuracy needed to correctly detect our ROIs, so that we can accurately track the ants’ movement through them.

HMC Bee Lab

Pages

Thursday, November 19, 2020

Template Matching: Automating “Where’s Waldo?”

No comments:

Post a Comment