AutoML: What is Automated Machine Learning and will AutoML transform Data Science?

Kadir Sümerkent
6 min readApr 27, 2022

Data science is a trendy field, and the work done in this field has allowed us to make extraordinary progress in many different fields. Let’s give an example from the field of health. We use many different machine learning applications, from research areas such as molecular design, genetic research, and drug design to many clinical and surgical branches such as nuclear medicine, radiology, and even neurosurgery. Of course, machine learning has started to take an important place in daily life in many areas other than health.

Although machine learning is perceived as an attractive career field for many, many people encounter a very different world than they imagined when they stepped into this field. We can list the stages of a classic machine learning project as follows:

  1. Processing, standardizing, and cleaning of the existing dataset.
  2. Selection and construction of appropriate features.
  3. Selection of an appropriate model family.
  4. Optimization of the model hyperparameters.
  5. Designing of the neural network topologies (if deep learning is used).
  6. Postprocessing of the machine learning models.
  7. Analysis and validation of the obtained results.

While each of these 7 basic steps we have listed is a separate universe, a unique harmony must be achieved between these substances. The complexity of these processes causes many people who want to step into this field to turn to different fields. However, things are not perfect for experts working in this field. Because many activities performed in these steps (for example, hyperparameter optimization) are based on experience and the results to be obtained through experiments, these experiments can take a lot of time. Another problem is that while you think your project progresses, an error or omission can take you back a few steps.

Why AutoML?

I actually started answering this question in the previous paragraph. However, we can define AutoML as a toolkit that automates the machine learning processes to not leave the question unanswered. However, we should not deduce from this sentence that all machine learning stages can be automated. At least for today.

As we will briefly mention in the following sections, there are many AutoML frameworks that you can use. There are solutions for every stage with these frameworks, from cleaning and standardizing the raw dataset to creating a machine learning model ready to be deployed. These frameworks continue to be developed rapidly by the relevant companies, so we can expect to see many processes that cannot be fully automated today and in the future.

With the development of AutoML, machine learning applications that can only be implemented by experts in this field today will be able to be implemented by people who do not specialize in this field. While this may not seem like good news for today’s data scientists, it will mean that people with business domain knowledge will be able to implement these projects. Much more machine learning applications will be made available to people much faster.

Today, there are 8 prominent AutoML frameworks. These are: Google Cloud AutoML, Apple CreateML, Microsoft Azure, IBM AutoAI, Amazon SageMaker, Auto-Keras, Auto-PyTorch, Auto-sklearn. As you can see, all the 6 big tech giants except Facebook already have an AutoML solution.

How AutoML Works?

We said that AutoML automates many processes in machine learning projects. Each of the 7 stages I have listed for machine learning projects includes many technical operations. The operations performed or the parameters used in these stages may differ from project to project. We often need to conduct experiments that take a lot of time to reach ideal values.

To briefly summarize the stages of a classic machine learning project:

Process of a Standart Machine Learning Project (1)

The first phase of almost all machine learning projects is the cleaning and standardization of the data at hand, which is vital to the project’s success. Then the data is split to form training, validation, and test datasets. Then the feature engineering process comes, and when this stage is completed, the feature selection process is completed. We use many algorithms in machine learning projects, and the algorithm to be used must be determined based on many criteria such as data type, amount, and continuity. After determining the machine learning algorithm (or algorithms) we will use, the optimization of the algorithm parameters is performed again with an experimental process. We can deploy our machine learning model at the end of all these processes by making it ready.

Each of these steps, which I have visualized as a single box in the image, consists of many iterations. An error noticed at these stages may cause a lot of time to be lost in the project, while not detecting the error can significantly reduce the model’s accuracy.

AutoML Process

The basic idea of AutoML is to automate all these processes as much as possible. Some AutoML frameworks can automate many steps, including data cleaning and standardization, although not completely. Even in scenarios where we can’t talk about full automation, AutoML is a technology that saves data scientists a lot of time.

So what exactly does AutoML do?

AutoML actually automates our time-consuming experiments and tests each one by generating different combinations of features, algorithms, and parameters. Each combination creates a different model and determines the optimal feature, algorithm, and parameter combination for us by comparing the accuracy of the resulting models.

AutoML Model Selection Process

Limitations of AutoML

While AutoML has the potential to save lots of time, it comes with its own limitations. I would like to highlight three limitations that are important to me.

The first limitation is the level of control you have in the AutoML environment. Most of the time, you cannot alter or further tune the generated output model. You can design your own model using the AutoML model as a template. It will help you save lots of time and effort anyway.

The second limitation I want to highlight is the explainability: Most of the time, you will not be able to understand the internal dynamics of the AutoML process. While explaining a machine learning model is complex enough, the black-box nature of the AutoML makes it harder to explain the model. In some cases, explainability can be a reason for not using AutoML, where you are required to explain how you process the data in some fields like medicine.

Finally, while AutoML frameworks can automate the easiest two steps, model selection and hyperparameter tuning, they cannot replace feature engineering, where human creativity combined with imagination and domain expertise is needed and problem-dependent and is one of the most time-consuming and laborious stages of any data science project.

Can AutoML replace Data Scientists?

When I launched the lecture on Artificial Intelligence in Medicine, my medical school colleagues (specially Radiologists) were concerned about advancing artificial intelligence in medicine. And most of the time, I and other specialists who work both in medicine and AI hear the same question: “Will AI replace doctors?” In an event, I replied to this question as, “No, physicians will not be unemployed because of artificial intelligence. However, I cannot say anything about physicians who do not use artificial intelligence.” In the intervening period, I am happy to see that many of my colleagues (especially Radiologists :) ), who approach artificial intelligence with anxiety or prejudice, are conducting academic studies on artificial intelligence.

I think the same is true for data scientists. AutoML is developing very fast, and this development will continue. However, just like in medicine, I do not think that the advantages provided by human-specific features such as domain knowledge, imagination, and creativity can be replaced by AutoML in the short term. Just as computers have not replaced statisticians and mathematicians, AutoML will not replace data scientists. But just as computers have changed the way statisticians and mathematicians work, AutoML will change how data scientists work. It will enable them to get results faster, do more experiments in less time and improve their results.

Conclusion

AutoML is an essential tool for the advancement of data science. It will be a stepping stone for more AI applications to be implemented by more people in more areas. However, it is a significant development because it eases data scientists’ workload. It allows them to focus on the stages that really require creativity.

So what do you think about AutoML and how it will impact the data science world? Share your thoughts with me in the comments section or on Twitter and LinkedIn.

References

  1. I redraw this image based on the original drawing from R. Olsan et. al. (2016) “Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science”

--

--