Awesome List for AutoML - Speed Up Model Development

autogluon, autokeras, pycaret, and more

Apr 09, 2024

white book page with black background — Photo by Mikołaj on Unsplash

AutoML (Automated Machine Learning) is a process that involves automating various aspects of the machine learning pipeline, from data preparation to model selection and hyperparameter tuning. AutoML aims to reduce the need for manual intervention in the machine learning workflow, making it more accessible to users with limited expertise in machine learning and data science.

Some key components of AutoML include:

Data Preprocessing: Automatically handling missing values, categorical encoding, feature scaling, and other data preprocessing tasks.
Feature Selection and Engineering: Automatically identifying relevant features and creating new features to improve model performance.
Model Selection: Automatically choosing the most appropriate machine learning algorithm for a given task, based on the characteristics of the data and the problem at hand.
Hyperparameter Tuning: Automatically optimizing the parameters of machine learning models to improve their performance, using techniques such as grid search, random search, or Bayesian optimization.
Model Evaluation and Validation: Automatically assessing the performance of models using cross-validation, train-test splits, or other evaluation metrics.

Many open-source tools offer AutoML such as PyCaret, AutoGluon, AutoKeras as well as cloud services GCP, AWS, Azure AutoML. If the data science team in your company is already matured, you can build in-house for your specific needs. By using AutoML, organizations can save time and resources in the development and deployment of machine learning models, while also enabling non-experts to leverage the power of machine learning for various applications. However, it's essential to note that while AutoML can significantly streamline machine learning workflows, it does not replace the need for domain expertise and human judgment in understanding the problem, interpreting results, and making decisions.

AutoML (Automated Machine Learning) offers several pros and cons, which are important to consider when deciding whether to incorporate it into your machine learning workflow.

Upsides:

Accessibility, AutoML makes machine learning more accessible to users with limited expertise in data science and programming, allowing a broader range of people to leverage its benefits.
Time and resource efficiency, By automating many of the time-consuming tasks in the machine learning pipeline, AutoML can save time and resources, allowing data scientists to focus on more complex problems and higher-value activities.
Improved model performance, AutoML can help optimize model performance through automated feature engineering, hyperparameter tuning, and model selection, leading to better predictions and decision-making.
Scalability, AutoML helps to scale the development process, monitoring model performance which makes it easier to manage model tracking.

Downsides:

Less control and transparency: By automating various aspects of the machine learning workflow, AutoML can reduce the level of control and transparency users have over the process, making it harder to understand and fine-tune the models.
Limited customization: AutoML solutions may not be as flexible as custom-built models, making it difficult to address unique or domain-specific problems that require specialized approaches or algorithms.
Potential overreliance: Relying too heavily on AutoML may discourage users from developing a deep understanding of machine learning concepts and techniques, limiting their ability to tackle more complex problems or fine-tune models effectively.
Possible bias and ethical concerns: Automating the machine learning workflow can introduce or amplify biases in the data, leading to potentially unfair or unethical outcomes. Human oversight and judgment remain crucial in addressing these issues.

Regardless of the negative side as mentioned above. These are my hand-pick for open-sourced AutoML tools that you can try

AutoGluon

Github: https://github.com/autogluon/autogluon

Docs: https://auto.gluon.ai/stable/index.html

Highlight:

Handle tabular, time series, image, and text data
Support multimodal data
Provide minimal deployment
Support AWS Sagemaker as backend (infra)

AutoKeras

Github: https://github.com/keras-team/autokeras

Docs: https://autokeras.com/

Highlight:

Support Multi-task multi-modal
Extension:
- TF Cloud (GCP Infra)
- TRAINS (ClearML model tracking)

FLAML

Github: https://github.com/microsoft/FLAML

Docs: https://microsoft.github.io/FLAML/

Highlight (from the docs):

FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weaknesses.
For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend.
It supports fast and economical automatic tuning, capable of handling large search spaces with heterogeneous evaluation costs and complex constraints/guidance/early stopping.

NNI

Github: https://github.com/microsoft/nni

Docs: https://nni.readthedocs.io/en/stable/index.html

Highlight:

Hyperparameter Optimization
Neural Architecture Search (NAS)
Model Compression
Feature Engineering

PyCaret

Github: https://github.com/pycaret/pycaret

Docs: https://pycaret.gitbook.io/docs

Highlight:

End-to-End model development
Analysis & Interpretability
built-in experiment logging

Conclusion

In the end, AutoML is an important change in the world of artificial intelligence, letting more people use machine learning. It can help many industries do things in new ways and solve hard problems. AutoML has good things about it, like making it easier for people to use, helping things work faster, and scaling better. But one must remember to keep a good mix of using people's know-how and being careful about possible problems. Doing this will help us get the most out of AutoML and use AI smartly and safely.

The Beep

Discussion about this post