HMC Bee Lab: Toddlers in Transistors: Teaching Machines to Learn

If you use Amazon, Facebook, Google, or any other of a multitude of sites, you’ve seen machine learning in action. It helps you tag your friends on Facebook, recommends helpful reviews on Amazon, and helps to run Google’s data centers efficiently. Machine learning algorithms are everywhere.

Machine learning is revolutionizing fields from the sciences to humanities. It is used for everything from facial recognition to predicting strokes and seizures. As machine learning is used to solve countless problems that we never could before, it is becoming an ever more integral part of our lives. Despite this, many people still don’t understand what it is, how it works, or how powerful of a tool it can be.

We can “teach” computers using machine learning! Image from xkcd [1]

The idea of machine learning is actually a very intuitive one, much like teaching a little kid something new. You don’t really need to know neuroscience or understand exactly what is happening in their brain to see the process- maybe you tell them the fluffy animal over there is called a dog. Then they think all animals that look kind of similar are dogs as well, until you give them more information. Eventually, most people learn to distinguish dogs from other animals, but they don’t know exactly what their brain is doing to come to that conclusion. It just happens and we go on with our lives, taking it for granted.

Machine learning is similar. In a supervised learning algorithm, one of several classes of machine learning, the user feeds in a set of data. For example, a group of pictures with each identified as either a dog or not a dog. The algorithm runs, and eventually comes up with some way of classifying pictures. The user doesn’t necessarily need to know how the computer is classifying pictures. If you want to go deep into the math and computer science you certainly can, but this is by no means a requirement for using machine learning. Many languages already have a machine learning library set up with algorithms implemented in them. All the user has to do is call the algorithm they want correctly. Because these algorithms are pre-implemented all you need are some basic programming skills and a willingness to spend some time looking up the inputs for the algorithm you want to use.

In essence, machine learning is the process of teaching a computer to do a task on its own. The programmer gives the machine a training set of data. In our project the goal is to create a map of flower densities from a map of aerial images. The input is a set of images, each with an associated flower density. The computer then looks for patterns and correlations in this data and finds its own algorithm for computing flower density in a new set of images without any knowledge of flower density. Some machine learning algorithms are unsupervised. This means that rather than taking a set of training data it just takes a large data set. It then looks for patterns and associations, without any feedback from the user, and reports what it finds. This is used in data mining situations, for example looking for correlations in what people buy at the grocery store to optimize product placement.

Machine learning is an incredibly powerful tool. In more traditional programming situations the programmer must find the algorithm themselves. This can be extremely difficult and time consuming. For example, in facial recognition, finding the actual attributes of a picture that make it unique to one person is nearly impossible for a human. A computer, however, can crunch huge amounts of data and find patterns that aren’t so obvious to us. Problems like this are difficult exactly because they are so intuitive. We don’t have to think about how we do it, we just do, so we don’t know how to teach a computer to do the same thing. Machine learning helps us circumvent that problem, by having the computer teach itself.

Machine learning is now used to classify images. Image from xkcd [2]

There are many machine learning algorithms, far too many to discuss in one blog post, but below is an overview of some common algorithms, representing several classes of machine learning. You will likely hear about these algorithms often as they become more and more common in everyday technology.

Artificial Neural Networks

Artificial Neural Networks (ANN) are designed to mimic the way a human brain works. Information enters in an input layer. It is then communicated to deeper “hidden layers,” which make connections. Each of these connections is weighted, meaning that each has a different importance. These weights are calculated based on a learning rule, which changes connection weights according to patterns in the inputs.

Diagram of an Artificial Neural Network - showing each layer and connections between layers.

Based on these connections and their weights some output is calculated. This initial output is essentially a random guess - a starting point for the network to work from. After this first output is calculated it checks how far it was from the actual answer, and uses that error to adjust the connection weights. This is called back propagation, because error from the output layer is being sent back to the previous layers to make corrections. While not all types of ANNs use back propagation, it is the most common method.

ANNs are great for situations when you need an approximation. While the outputs aren’t always precise, they are valuable for discovering patterns and solving problems when the relationships between variables are vague or not understood. Image processing is one such situation. In many image processing applications, for example facial recognition, we do not know how the characteristics of the image relate to each other, or to the final classification. We also often don’t need to be precise: if the program misidentifies a face once in awhile nothing terrible happens in most applications.

Usually, we ask an ANN to identify an image, say of a dog. But we can also reverse the process, and ask a trained ANN to produce an image of a dog. This has led to some interesting results from Google. The images Google created demonstrate how the layers of the neural networks are interacting, and what features each layer looks for.

K-means Algorithm

K-means is a clustering algorithm. This means that it takes in a set of data, say the average red, green, and blue values for a set of pictures. It then attempts to sort this data into some number of groups, k. The goal is for each data point to be as similar to others in its group as possible. In the end, this algorithm gives you the original set of data sorted into groups based on similarity.

K-means is an example of an unsupervised algorithm. This means that the human doesn’t do any work to get the final result, just feeds the information to the computer and waits for it to find the patterns.

One way this algorithm is used is in reducing the number of dimensions in a data set. For example, a picture that uses many unique colors can be altered to use only a few colors, without changing the picture quality. Applications like this one are useful in reducing the size of data so that it can be stored and processed more easily.

Support Vector Machines

Support vector machines are an example of a supervised algorithm. This means that the user inputs a set of training data - inputs matched with their correct outputs - which allows the computer to find a function that takes the input and gives the desired output. This function can then be applied to a new set of data to get predictions for what the outputs should be.

There is a lot of math behind the algorithm for Support Vector Machines (SVMs), and I won’t go into all of it here. Basically, the goal of a SVM is to draw a line separating the given data points by as large a gap as possible.

Graphic showing how a support vector machine would choose a separating hyperplane for two classes of points in 2D. H1 does not separate the classes. H2 does, but only with a small margin. H3 separates them with the maximum margin [3]

This algorithm uses its training data set to determine the separating line (or hyperplane for higher dimensions). It then classifies new data points based on which side of the line they fall on. Determining the correct line requires some computation, but luckily for us this algorithm has already been implemented in Python, along with many other languages. We don’t have to worry too much about the mathematical theory. Instead we can focus on the bigger picture of applying machine learning to the problem at hand.

In our case, we’re trying to use machine learning to map out the densities of flowers across a large landscape, so that we can learn about what food bees have available to them and how they take advantage of it. One of the algorithms we are using is based on Support Vector Machines! We are using Support Vector Regression, a similar algorithm that outputs a continuous value rather than a class. For example, while a Support Vector Machine might classify images as either containing flowers or not containing flowers, Support Vector Regression gives a density value between 0 and 1 for each image.

There are many more algorithms out there, each with a different specific purpose, but each serves the same goal - to teach computers to teach themselves.

Media Credits

[1] Comic from xkcd. https://xkcd.com/894/

[2] Comic from xkcd. http://xkcd.com/1425/

[3] Image by Zack Weinberg. https://commons.wikimedia.org/wiki/File:Svm_separating_hyperplanes_(SVG).svg

HMC Bee Lab

Pages

Monday, June 29, 2015

Toddlers in Transistors: Teaching Machines to Learn

No comments:

Post a Comment