Agile organizations have been successful in improving collaboration and reducing waste in software development. They have also learned to automate and streamline their software delivery process. However, many teams are still struggling to leverage the same agile principles through their artificial intelligence (AI) initiatives. Operationalization of machine learning (ML) models is an increasing challenge and a barrier to AI adoption in many companies.
For many years now, the software industry is using practices to shorten the development cycles and increase deployment velocity. Continuous integration and continuous delivery help automate the process of building, testing, and deploying high-quality software releases.
In data science, AI experts usually focus on creating the best ML model to solve a business problem. But production-grade ML systems require much more than just some ML code, as shown in figure 1.
The ML code (the small box in the middle) represents only a tiny chunk of what is needed to build a business AI solution. That’s why there is a need for a rigorous ML practice that helps teams support and deliver data science projects. Such a practice must not only provide robustness and automation of core ML activities (train, test, validate, evaluate, refine), but also support a more comprehensive end-to-end ML lifecycle (from data acquisition to model execution and monitoring).
In most organizations I am working with, ML maturity is still relatively low, and they are struggling to deliver value through AI initiatives. Those who have realized that AI is becoming a real differentiator, a way to innovate in their industry, have started to invest in AI talents. The data scientists or ML specialists of these organizations are pretty good at exploring business problems using AI techniques. They leverage ML algorithms, they train powerful models, but then, they hand-off the models to some other IT team, hoping that the AI assets will be deployed appropriately and will live happily ever after…..
Unfortunately, enterprise-grade AI is not a fairytale. The approach where teams are working in silos does not scale. It is error-prone and does not ensure that a model is adequately operationalized. AI initiatives require automation and just-in-time flows to frequently create, retrain and update models.
MLOps is an ML engineering practice that aims at unifying ML system development (Dev) and ML system operation (Ops). MLOps advocates for better collaboration between data scientists, data engineering, software engineering, and operations teams. By applying DevOps practices to machine learning systems, the MLOps approach helps optimize the entire machine learning lifecycle management. It reduces frictions and delays in the AI value stream to shorten the time needed to convert a business need into an AI service.
With mature MLOps in place, teams are better equipped to support the entire machine learning lifecycle, from discovery and feasibility to model experimentation and operationalization. An MLOps process helps eliminate manual, error-prone activities and break silos between teams. Experiments are orchestrated, and continuous training is in place to ensure that deployed models stay accurate as data evolve over time.
As the MLOps practice matures in an organization, teams can go even further and put in place mechanisms to automate pipeline implementation and do CI/CD to build, test, and deploy ML pipelines. Performance monitoring of AI models can not only trigger new deployment pipelines, but it can also generate new experimentations. Feedback loops are in place to adapt systems based on behaviors captured from deployed models. And because all activities are tied together, MLOps also provides a foundation for AI transparency and explainability.
MLOps has not emerged out of nowhere and is not exclusive to ML practitioners. MLOps is an enterprise approach, and its adoption helps cross-functional teams work together. The early-adopters of MLOps understand the value for their organizations, and while working with many different customers, I can see interest coming from very diverse groups.
The first group is composed of executive teams. Usually, they have almost no ML experience as AI initiatives are still relatively new in many companies. They don’t fully understand what an ML model is and what data scientists are doing all day long. But they are experienced leaders who have supported several digital transformations in the past. So they fully realize that they must stand behind teams and help them become more efficient on AI projects. It is not rare to see business executives and directors asking for guidance on how to reduce AI time to market. They show significant interest in approaches to improve AI business impact and are key advocates for change.
The second group excited about MLOps is often composed of IT and operations teams. For them, it is a matter of solving a very concrete problem: the operationalization of ML models. At the end of the day, IT teams’ success is measured on their ability to support the business and run systems smoothly within a defined budget. An ML model is a new kind of IT asset that requires specific deployment and monitoring procedures. As IT and Ops teams don’t have the experience yet to manage and monitor ML models properly, they are looking for new and robust approaches to support the entire ML lifecycle.
And finally, the other group supporting MLOps adoption is composed of ML specialists and data engineers. As they develop innovative models, they want to make sure that their work becomes a valuable business asset. Data scientists are experts in AI and algorithms, but they usually have a limited software development background. They want to leverage an approach that facilitates ML activities from the ideation to the deployment phase.
There is a growing interest in MLOps among the ML community. On the one hand, technical practitioners are looking for a reproducible process to automate their work. On the other hand, line of business managers and decision-makers want to reap the benefits of artificial intelligence and quickly leverage ML for business use cases.