HMC Bee Lab: Identifying Flowers the Easy Way

This summer I have been continuing my work on bee foraging maps from last year, but with a new goal of classifying flowers by species rather than calculating density. While many of the basic goals and processes have remained the same, there are several updates that I am excited to share!

Aerial Mapping

Bee lab has a new drone! After the unfortunate crash of our Phantom 2 (which has now been repaired), we purchased a DJI Phantom 3 Standard. This has brought a few improvements to the mapping process. We are now using the Litchi app to control the drone. Litchi allows the user to plan a flight using their online mission planner and then upload it to a tablet or cellphone to control the drone in the field. This also means that the flights can be saved and repeated each week, an important feature in looking at how an area changes over time. The Phantom 3 can also accept more waypoints, while the Phantom 2 was limited to only 16 waypoints at a time. Using more waypoints I have more freedom in how missions are laid out, and along with the increased battery life I am able to map larger areas with each flight.

Out in the field learning to fly the new Phantom 3

Map Stitching

We have upgraded our map stitching software as well. Last summer we used Microsoft ICE to stitch our maps together. However, this software had several issues. It was unable to handle large numbers of pictures and required a lot of preprocessing. In order to make map stitching faster we stitched to Agisoft PhotoScan Pro. As a dedicated image stitching software it is able to handle thousands of photos at once and produces a much higher quality stitch.

A portion of one stitched map. Each map is 100-200 MB, so only a small portion can be displayed here.

Data Collection

One big focus of this spring and summer was data collection. Many flower species bloom in the spring in Claremont, while later in the summer there are very few plants in bloom. Because of this, much of the spring and early summer was spent flying over the Bernard Field Station, creating maps and collecting training data. Collecting as much data as possible was essential for creating a usable data set. The training data set is composed of a list of flower species and corresponding aerial images of those species. This data is fed into a machine learning algorithm and used to train the algorithm to recognize each species. In order for this training to be successful it is important to have as many examples of each species as possible. A total of 28 maps were produced over the course of the spring and summer.

Code Updates

This summer my focus has been on classifying flowers by species rather than determining density. Despite this difference much of the same code is used. I am using the same sliding window calculations and many of the same metrics as last summer. A sliding window means that calculations are done for a fixed size of square, i.e. 100x100 pixels, and this “window” is shifted by a set amount across the image until the entire image is covered.

I have also added new features:

Grey Level Co-Occurrence Matrix (GLCM)

Contrast
Correlation
Energy
Homogeneity

Color Moment

Standard deviation (for Hue, Saturation, and Value)
Skew (for Hue, Saturation and Value)

The GLCM provides measures of texture while the color moment provides measures of color and how it varies within the selected window. These new features combined with those from last summer will be used to classify different species.

I have also updated the machine learning algorithms that I use. While last summer was focused on regression because I was calculating density, this summer the focus is on classification. Because of this shift I am testing several new algorithms in order to find the best option.

Random Forests
K Nearest Neighbors (KNN)
Support Vector Machine (SVM)
Stochastic Gradient Descent (SGD)
Decision Tree
Perceptron

Of these algorithms I have had the most success with KNN and decision tree thus far. However, this is from very preliminary testing and of course there is a lot of work to be done before I settle on a final algorithm. Below are the tenfold cross validation scores for each algorithm (with 1 being the best possible score).

Algorithm	Tenfold Validation Score
Random Forest	0.672
KNN	0.610
SVM	0.234
SGD	0.532
Perceptron	0.678
Decision Tree	0.551

Continuing Work

I’ll be continuing to work on this project for the next few months, and I am hopeful that with more tuning of parameters and careful selection of features the classification algorithm will be able to recognize several of the most abundant species at the BFS. I will be adding more data to the data set from our spring experiment investigating pollinator visitation. This data set is composed of specific flowering patches rather than transects. For each patch the number of flowers was counted or estimated and all patches were flown over each week. I will add this data to the training set used to train each algorithm. I am also testing each feature to see what is important to the classification and which features might actually make performance worse. I will also be testing each algorithm carefully including varying the tile size and the overlap amount in order to find the optimal parameters for this classification.

2 comments:

UnknownJanuary 17, 2017 at 12:18 PM
WIll be working on a way to use land based drones to count bugs over larger area's using a few different methods, but need the raw data input. . . anyone wanna know to a healthy degree of probability what area's are covered in what bugs?
Matina Donaldson-MatasciJanuary 17, 2017 at 3:00 PM
Wow, this sounds cool! Would love more info on how you plan to do this. What kind of bugs are you targeting?

HMC Bee Lab

Pages

Monday, August 8, 2016

Identifying Flowers the Easy Way - With a Drone!