Deploying a machine learning model in a production environment is a complex process that requires consideration of many factors, and there are also many challenges. Recently, researchers from Cambridge sorted out the common problems of this process.
In recent years, machine learning has received more and more attention in the field of academic research and practical application. However, there are many problems and concerns about deploying machine learning models in production systems. Recently, researchers from Cambridge conducted a survey to review reports on deploying machine learning solutions in various use cases, industries, and applications, and extract factors that need to be considered at each stage of the machine learning deployment workflow.
The survey shows that machine learning practitioners face challenges at every stage of deploying models. The significance of this paper is to develop a research agenda to explore ways to solve these challenges.
The survey mainly considered three types of papers:
- Use case research papers: These papers provide the history of a single machine learning deployment project, usually in-depth discussion of each challenge the author faces and how to overcome it.
- Summary articles: These articles describe the application of machine learning in a specific field or industry, and usually summarize the most common challenges in deploying machine learning solutions in the field involved.
- Experience-based papers: Authors usually review their experience of deploying machine learning models in production.
To ensure that this survey focuses on current challenges, Cambridge University researchers only consider papers published in the past five years, with a few exceptions. In addition, they also cited other types of papers, such as practice guidance reports, interview research, and rules and regulations. It should be noted that no new interviews were conducted for this paper.
Machine learning deployment process
This paper uses the machine learning deployment workflow definition proposed by Ashmore et al. According to this definition, the process of developing ML solutions in an industrial environment consists of 4 stages:
- Data management: The focus is on preparing the data needed to build a machine learning model.
- Model learning: model selection and training.
- Model verification: Ensure that the model meets specific functional and performance requirements.
- Model deployment: Integrate the trained model into the software infrastructure required to run the model. This stage also covers model maintenance and update issues.
Each of the above stages can be further subdivided. However, it should be noted that the sequence may not completely match the actual scene. It is normal for these stages to be executed in parallel or to have a feedback loop.
This article discusses the common problems that occur at each stage and the cross-domain issues that affect each stage. See the table below:
Data is an indispensable part of any machine learning solution, and training and testing data has no less influence on the overall effect of the solution than the algorithm. Creating high-quality data sets is usually the first step in a production-level machine learning process. The paper introduces the four steps of data management: data collection, data preprocessing, data enhancement and data analysis.
Data collection aims to discover and understand the available data, as well as to organize the storage structure of the data. Discovering and finding data is a challenge in itself, especially in large production environments. Finding the data source and understanding its structure is the main task, which plays an important role for subsequent data scientists to develop practical applications.
The preprocessing steps usually involve a series of data cleaning steps: impute missing values, reduce the data to an ordered and simplified form, and map from the original format to a format that is easier to process.
There are many reasons for data enhancement. One of the most important reasons is the lack of labels in the data. Data in the real world usually has no labels. There are three possible factors for lack of labeled data: limited expert access, lack of high-variance data, and excessive data volume.
Data analysis aims to discover potential deviations or unexpected distributions in the data. High-quality tools are essential for any type of data analysis, and visualization of data profiling is extremely challenging.
In recent years, research on machine learning methods tends to better select and use various models and methods in the model learning stage. In the past six years, the number of paper submissions from the top machine learning conference NeurIPS has quadrupled, from 1,678 in 2014 to 6,743 in 2019. Nevertheless, the model learning stage is still affected by many practical factors. This article mainly clarifies the issues related to the three steps of model selection, training and hyperparameter selection.
In many practical cases, model selection usually depends on a key characteristic of the model: complexity. Although deep learning and reinforcement learning are becoming more and more popular in the research world, in practice, simpler models are often chosen. Commonly used models include: shallow network architecture, simple methods based on PCA, decision trees, and random forests.
One of the most concerned issues of model training is the economic cost caused by computing resources. In the field of natural language processing (NLP), although the cost of a single floating-point operation is decreasing, the total cost of training NLP models is increasing. Sharir et al. chose BERT, one of the SOTA models, to conduct experiments. They found that depending on the size of the selected model, the economic cost of the complete training process may range from US$50,000 to US$1.6 million, which is beyond the reach of most research institutions and even companies. . In addition, the size of the training data set, the number of model parameters and the number of operations used in the training process all have an impact on the total cost. It is worth mentioning that the amount of model parameters is a particularly important factor: the new NLP model has reached billions of parameters, and this number may increase further in the future.
In addition to the parameters learned during the training process, many machine learning models also define some hyperparameters. Hyperparameter optimization (HPO) is the process of selecting the best set of these hyperparameters. Most HPO techniques involve multiple training cycles of machine learning models. In addition, the size of the HPO task increases exponentially with each new hyperparameter because it adds a new dimension to the search space. As Yang and Shami said, these considerations make HPO technology very expensive and resource intensive in practice, especially for deep learning applications. Even if methods such as Hyperband and Bayesian optimization are specifically designed to minimize the required training period, some problems still cannot be solved due to issues such as model complexity and data set size.
The goals of the model verification stage are multifaceted, because the machine learning model should generalize well to unseen inputs, show reasonable handling of edge cases and overall robustness, and meet all functional requirements. This paper discusses issues related to the three steps in model verification — requirements coding, formal verification, and test-based verification.
Defining the requirements of a machine learning model is a key prerequisite for testing activities, but in actual situations, it is often found that model performance improvements cannot be converted into gains in business value.
Formal verification refers to verifying whether the function of the model meets the requirements defined within the scope of the project. This type of verification includes a mathematical proof of its correctness, and can also include a numerical estimate of the output error range, but this rarely happens in practice, and it is often formalized through a broad regulatory framework to set high-quality standards.
Test-based verification aims to ensure that the model can generalize well to unseen data. Although collecting validation data sets is usually not a problem, it may not be enough for production deployments.
The machine learning system running in the production environment is a complex software system that needs to be maintained according to time changes. This brings new challenges to developers. Some of these challenges also exist when running regular software services, and some are unique to machine learning.
The model integration step includes two main activities: building the infrastructure for running the model, and implementing the model in a usable and supportable form. The former is almost entirely a subject of systems engineering, while the latter belongs to the field of machine learning, which reveals important issues at the intersection of machine learning and software engineering.
Model monitoring is one of the issues in maintaining machine learning systems. The community is in the early stages of understanding the key indicators of the data and models to be monitored and how to initiate alerts. Monitoring constantly changing input data, prediction bias, and overall performance of machine learning models is an unresolved problem.
Another maintenance issue highlighted in the paper is related to data-driven decision-making, namely feedback loops. Production-level machine learning models can influence their behavior through regular retraining. While ensuring that the model is kept up-to-date, we can create a feedback loop, that is, adjust the model’s input to affect its behavior.
After the model is initially deployed, it usually needs to be changed to ensure that the model always reflects the latest trends in the data and environment. There are multiple techniques to adapt the model to new data, including regular retraining and continuous learning as planned. But in a production environment, model updates will be affected by many practical factors.
In addition to the following issues related to the four stages of the machine learning model deployment workflow, the paper also discusses issues related to ethics, user trust, and security. For details, see the original paper.
Refrence:Paper address: https://arxiv.org/pdf/2011.09926.pdf