AutoML: An Introduction To Get You Started

AutoML

AutoML is an exciting new trend in the Machine Learning (ML) industry. It revolves around analyzing data automatically and getting meaningful insights with minimum effort. By using the ingested data it is also capable of building models which can later be used as predictors for new data points. 

What is AutoML and what is it used for?

AutoML is a recent artificial-intelligence-based solution which has started to gain popularity because of its easy application and great variety of use cases. The core idea – as indicated by the name Automated Machine Learning – is that you can apply ML to real-life problems even if you’re not highly skilled in this field. Due to the high degree of automation involved, this new solution can be exciting and useful if you don’t want to become an expert in ML but want to apply it to a real-life problem you’re facing.

The key idea behind AutoML is that you take your data, ingest it into a piece of software or service which automatically analyzes the data. Then the system gives you a fully implemented machine learning model. You can use this model for anything that an ML model can be used for, for example to predict target values from features, for basic classification or regression problems, for object detection in visual data etc.

AutoML and Google Cloud

Now let’s dive into Google Cloud AutoML specifically. Google Cloud AutoML has a wide range of services:

  • AutoML Vision ‒ for object detection using pretrained models and custom image classification. 
  • Video intelligence API ‒ for classifying video segments and object tracking in videos
  • AutoML Natural Language and AutoML Translation ‒ for processing and translating textual data.
  • AutoML Tables ‒ for prediction and classification from structured data, like databases or spreadsheets.

There is great variability in how these services are used by different companies. There are three use cases more common than others. AutoML is most typically used for:

  • Proof of concept
  • Baseline model
  • Deploy to production

Let’s look at an example for each of these use cases.

Proof of concept

Let’s assume our company has some data. We need to decide if it’s possible to use that data to  predict some important target variable. With AutoML we can make that decision.. Basically, it gives us a strong indication as to whether it is worth even starting an ML project or not, based on the dataset with the desired target value.

Let’s look at a more concrete example.

Let’s assume a webshop needs to decide, based on historical purchase data, if it is possible to predict how many days will pass until one specific customer returns to the shop and buys something else. With AutoML it is possible to get a fairly reliable indication of how accurate this kind of forecast can be.

Baseline model

In this use case, the company already has an ongoing ML project or is just starting one. AutoML can bring  a baseline model to the table here. For the Data Science team, this will provide guidance as to what possibilities are available with the data. Depending on the specific data, the predictive performance of the AutoML model is sometimes powerful and hard to reproduce manually by Data Scientists. At other times it is not that hard, because the predictive performance is not that powerful. But either way, there is proof that, at least in theory, it is possible to reproduce the predictive performance of the AutoML model.

Here I have to mention that some of the algorithms used by Google AI Platform’s AutoML incorporate pretrained models. This means that there is a possibility that it’s not just our data that is used to build the model, but information from other sources is also incorporated. For an inhouse Data Science team, therefore, it is practically impossible to reproduce that predictive performance.

Deploy to  production

Of course it is also possible to use AutoML as an end-to-end solution. With this goal in mind, you can upload the dataset, do the model training, then deploy it and use it in production. This means that anyone can have a very powerful image detection system for example, with close to zero development time. The Google Cloud Platform will provide a REST API for the trained model which can be used by any other backend system or application.

Let’s see AutoML in action!

Flower recognition

One cool feature of Google’s AutoML is to classify images into distinct categories, using a model which was built by our sample data.

We can take the well-known Kaggle Flower recognition dataset prepared by Alexander Mamaev and build an AutoML solution on that.

This dataset contains five kinds of flowers with a total of 4242 individual images.

AutoML flowers dataset

The first thing we have to do is to upload all the images to Google Cloud Storage. With the gsutil tool provided by Google, this step is really straightforward. Then we have to prepare a list of all the images labeled correctly by the kind of flower it contains. This file should be saved in CSV format, and also can be saved to the cloud storage, next to the images.

The last part is to import the dataset, train and then evaluate the model. These steps require just a few clicks on the user interface, but depending on the dataset it can take a while to complete, typically a few hours. 

When the model training is finished, we see something similar in the Cloud Console:

Google Cloud AtuoML

As you can see, the trained model is about 94%‒96% confident in distinguishing between the images.

With all this set up, the model can be deployed, which means Google will provide a REST API service to make predictions on new images.

This service can be useful for example on an ecommerce website to auto classify images uploaded by their users.

Summary

AutoML is attracting more and more attention because there are multiple layers of possible use cases in a machine learning project. Every project can make use of it to some extent. Also it is really easy to explore the possibilities with near-instantaneous results.

Considering all this, it should be no surprise that Google keeps adding new features, like the recently added AutoML Tables for tabular data.