Introduction
How do we get computers to see what we see? You might think it is as simple as hooking a camera up to a computer. Sure that means that we can get images of what we see, but getting a computer to recognize all the things that we as humans can recognize from an image or video is actually quite difficult, and it is an ever-developing field of computer science. Part of the problem is that it isn’t super well understood how our perceptual systems work.
For example look at this picture of a bee hive
| Frame from a video of an observation hive |
This is in fact a plexiglass-walled observation honey bee hive that allows us to see (and film!) honey bee hive behavior. There are many bees standing on a dark colored honeycomb with some of the cells capped with beeswax and some left open. There is also a light-colored wooden frame, and a label with “#3” taped on the outside of the plexiglass, with a black, white and grey checkered reference card next to it. If you look, you can (mostly) clearly pick out the bees from the background, even in places where the bee and the comb are close to the same color. Your brain does an enormous amount of processing to see and understand all of these things, all in the blink of an eye.
Now take a look at this picture, what can you see?
| Turns out digital images are just matrices! |
Not much, other than a bunch of numbers, right? If you are familiar with the RGB color space, maybe you can tell that some areas are lighter or darker, but it’s difficult to actually see anything (unless you are Cypher from the Matrix!). This is actually how computers “see” the above image, as three different values for each pixel (check out Cat’s blog post for more on that!). (Or peek inside your favorite image using spreadsheets here!) So let’s talk about how to extract the information we want from an image.
What do we want from this image?
When you look at an image, you can gain a bunch of information from it at once: the objects in the image, any writing or other symbols, the background of the image vs. what is in the foreground, etc. But in computer vision, what we want from the image can drastically change the strategies used to obtain that information. In our case, we want to be able to separate the bees from their background so we can study the bees’ behavior. This is a common problem of “foreground detection,” and is something that many computer vision scientists have been working on since the beginning of the field. Our overall goal is to get the contours of the bees, as if we could manually outline each one and color it in, allowing us to separate them from their background. Foreground detection can be done in a variety of ways, but a common way is to use the following technique.
First, we convert the image to grayscale, then we do a gaussian blur on the image, and finally we threshold it. So why would we do blurring and thresholding, and how do they work? The blurring step functions like a filter, filtering out some of the bright and noisy points we want to ignore, since we only care about the contours of the bees and don’t need the extreme detail to separate them from their background. The thresholding takes every pixel lighter than a certain threshold and makes it white, and everything else is turned black. From there, ideally each bee would show up as a little white bean-shaped blob. Now, we can more easily say that white parts of the image within a certain size range are “bees” and the rest is background. This simple method works based on the assumption that the foreground has uniformly good contrast with the background. That is, if the background is equally dark and the objects in the foreground are equally light across the whole image.
![]() |
| A common foreground detection algorithm: grayscale, blur, threshold |
That worked pretty well, right? Well what if we have an image like this:
| Less than ideal video conditions |
Here, our assumption about the uniformity of the contrast between the background and foreground (comb and bees) is less valid, right? There are sections that are much darker than others, and the contrast is much less uniform. If we run the same foreground detection algorithm on this image, we get the following results.
| Far less than ideal thresholding |
Not great. Almost all the bees are almost indistinguishable from the background, and we see some artifacts from the condensation on the plexiglass cover.
Improvements
So, how can we make this better? There are a couple ways we can improve this, but the two ways I chose to improve it were by increasing the overall contrast in the original image and making use of adaptive thresholding, described below. One way to increase contrast in the image was to make use of histogram equalization, which loosely works by equalizing the intensity of all of the different colors in the image. Here is the effect after doing the histogram equalization:
| After histogram equalization, better! |
| Improved thresholding |
A little bit better, but still not the goal. Notice how there are sections of the image that are much brighter, as opposed to other sections which are just dark. For foreground detection, this means that the brighter areas are washed out and a lot of detail is lost, and detail is also lost in the darker areas. To make this situation better, we can use adaptive thresholding, which decides a threshold based on a surrounding neighborhood of pixels, instead of using a global threshold. This allows for better thresholding in situations where the lighting isn’t uniform. This is what it looks like when we do adaptive thresholding! (Wow so much better!)
| Adaptive thresholding to the rescue! |
Now you might be thinking that it is problematic that we can still see aspects of the background, like the honeycomb or the reference card. But since we’re mostly focused on the movements that bees make, we only care about things that are moving. To that end, we can take the difference between two frames to see which objects are moving frame to frame and by how much. Below are results from doing that process on another set of images. As you can see, this all but eliminates the unmoving parts of the image we do not care about, and allows us to better focus on the bee movements. Now we can get down to work on automatically decoding the bee dances!
| Frame differencing makes movement easier to see |
Conclusion
So how do computers see? The answer turns out to be very complicated. We have seen one approach for foreground detection to separate bees from their background, as well as some strategies to improve foreground detection for various lighting conditions. The process uncovers how complex of a problem “seeing” is, and how sighted people can take that for granted. Computer vision scientists have come up with some (perhaps unintuitive!) strategies to replicate the human visual system’s ability to process all sorts of things, including studying bee behavior!
Further Reading
Automatic Analysis of Bees’ Waggle Dance
Jordan Reece, Margaret Couvillon, Christoph GrĂ¼ter, Francis Ratnieks, Constantino Carlos Reyes-Aldasoro
bioRxiv 2020.11.21.354019; doi: https://doi.org/10.1101/2020.11.21.354019
What is OpenCV and Computer Vision?
https://research.nvidia.com/publication/realtime-computer-vision-opencv
Computer vision using Machine Learning techniques

Nice post
ReplyDelete