Figure 1: A 2D LoG Filter with σ = 1
In 1966, Marvin Minsky assigned a student named Gerald Jay Sussman a summer project to let a “computer describe what it saw” [1]. It turned out that the problem was not that easy.
Perception seems to be a very mundane task to humans. Our brains just automatically interpret all the information for us. It is so efficient that we don’t even realize its existence. Yet for computers, even to achieve the simplest form of perception will take a considerable amount of effort. This is also known as Moravec's paradox: “it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility” [2].
The topic of blob detection in a picture is a perfect example of such paradox. Blobs here refer to connected regions in pictures that have different intensities from the background. Blob detection is used in our lab to detect ants from the background, so that we can automatically analyze the behaviors of ants using videos.
Figure 2: Two kinds of blobs, three ants [a] vs. a black circle. They do look pretty similar.
While humans can easily recognize the blob, it takes quite a bit effort to program a computer to do so. In this blog, we are going to talk about convolution with the Laplacian of Gaussian (LoG) (see Figure 1), which is a commonly used method for finding blobs in pictures.
Intuition behind Laplacian of Gaussian and Convolution
Figure 3: A “blob” function F(x) and its first and second derivative. Notice how the second derivative crosses the zero point at the origin where the function has its maximum value
Coarsely speaking, the Laplacian can be understood as taking second derivatives. As shown in Figure 3, the second derivative of the function crosses zero at where the function reaches its maximum value. This is great if we just want to detect edges: we can just take the second derivative and find where the resulting image has a value of zero. But it doesn’t work in blob detection, as we want to find both the position of the blobs and the sizes of the blobs. Notice that the process of taking second derivatives on discrete images in practice is often done by convolving with a Laplacian kernel. (For more info on convolution, this Wikipedia page is really helpful.)
To both detect the location and size of the blobs, we need to combine Laplacian with convolution and Gaussian. First, we convolve the blob with a Gaussian to smooth out the edges, then we apply the Laplacian operator by convolving the result with a Laplacian kernel.
Figure 4: Convolution of Gaussians with different variances with a blob
Figure 5: Applying Laplacian on the results depicted in Figure 4
Why do we need to convolve the blob with a Gaussian first? Figure 5 and Figure 4 can help you understand this question. After convolution with a Gaussian of the right amount variance, the result after applying Laplacian operator will have one peak at the center of the blob as shown in Figure 5 by the red line. If we convolve the blob with a Gaussian with a variance much smaller than the radius of the blob, then the result after applying the Laplacian will look like the green and yellow functions in Figure 5: the function will cross zeros at the edges, and cannot be used to detect the blob. Intuitively, convolution with a Gaussian smooths out the edges so that the Laplacian will not have separate responses to each edge, but instead will have one response to the entire blob.
From Figure 4 and 5, it is also obvious that when the proper Gaussian is used, the resulting function will have the highest peak in magnitude. Therefore, to obtain the size of the blob, we can just convolve the blob with Gaussians of different variances, take the Laplacian, and then find the variance that produces the highest/lowest peak.
In practice, because convolution is associative, we first convolve the Gaussian with the Laplacian, then convolve the image with the resulting LoG kernel to find blobs.
A Simple Case: Applying LoG on a black circle
Figure 6: Responses from LoG convolution with different variances
Figure 6 demonstrates convolution of LoG filters with different variances on a black circle. Notice how the peak achieves the highest value at sigma = 25.75 (the bars on the right have numerical values on it, notice how the highest value for sigma = 100 graph is around 4, whereas that of sigma = 25.75 graph is around 150). Also, when sigma = 13.38 or 1, (the first & second graph from the left on the top row), instead of having a single peak, the graph has a ring of peaks surrounding the blob. Mathematically, the response from LoG convolution is the highest when
Now that we have a working blob detection algorithm, we can use the information we got to determine the properties of the blobs: we can use the variance to find the radius, and use the location of the peak to find the location of the blob.
Why are we doing this algorithm again? Well, we want to know the locations of the ants. Our lab right now is working on a project involving monitoring the activities of ants. We have already accumulated hundreds of hours of videos. Without an automated system of recording the positions of ants, it would be absolutely painful to manually watch all the videos and record the locations. This blob detection algorithm is an important building block of such monitoring system. With its outputs, we can develop all kinds of software to further analysis what’s going on in the ant colonies.
References:
[1] Crevier, Daniel. AI: The tumultuous history of the search for artificial intelligence. Basic Books, Inc., 1993.
[2] Moravec, Hans. Mind children. Vol. 375. Cambridge, MA: Harvard University Press, 1988.
Media Credits: [a] Ant image by Mia Farago-Iwamasa
Media Credits: [a] Ant image by Mia Farago-Iwamasa

No comments:
Post a Comment