From laptop to reality: why MLOps’ biggest challenges are organizational

Photo by George Milton from PexelsMore and more businesses are embracing machine learning, but launching a model into the real world requires more than just building it. To realize its full benefit, an ML model needs to be deployed so it can communicate with a core software system, and draw data from a core database.
But there are issues with deployment that have nothing to do with technical complexity. Recent research from Algorithmia suggests cultural and process problems stop many companies from achieving their machine learning objectives. The study says that only about 20 per cent of companies with machine learning plans have had success moving an ML model into production.
That issue seems to resonate here in the MLOps Community. We hear again and again how difficult it can be to bridge the gap between ML model building and achieving practical deployments.
We’ve delved into the issue a few times in recent podcasts. Here’s a sampling of the organizational challenges and practical problems ML experts and data scientists commonly encounter.
CHARLES MARTIN: CALCULATION CONSULTING
‘For machine learning to work, you need some sort of deployment system, and most of them are homegrown. So they’re designed to deploy things they’ve built in the past. Machine learning is very different.
‘The stack is entirely different; the hardware requirements are different; the memory requirements are different; the swap requirements are different. It has different artefacts. So often the technical environments in a company aren’t structured to handle this new workflow.
‘It’s what we commonly see inside enterprisers. They have ops systems that make it unbearably heard to get ten lines of code into production. Most of the work you end up doing is just trying to figure out how to make something fit within the workflow. And the more stubborn people are about how things ‘should be done,’ the harder it is.’
Those sorts of organizational barriers can create headaches — un unnecessary costs — while bogging down progress.
‘You come up against some really goofy things’. MLOps is about provisioning, and if you want to provision something, you have to ask someone for permission to do it. We had one client who had the largest Hadoop store in the world, who gave us a machine to deploy their app that didn’t have any memory.
‘Why would it need memory? Hadoop doesn’t use memory?’ Well, it turned out that the machine had no swap. So I’m like, ‘OK what am I supposed to do with this?.’ We had to hire a dedicated specialist to find a workaround.
Off-piste assumptions can create problems that cascade until they impact application performance, adds Martin. Take the basic issue of correct provisioning of machines for machine learning applications.
‘In a large organization, provisioning is part of the IT budget, meaning you can’t provision your own machine. And that can create problems.
‘In the old days when you installed Python you had to turn off the CPU throttle in order to optimize the blast libraries and get correct performance. However, you need root privileges, and that requirement doesn’t exist in the web engineering world. So getting it can be a huge problem, and it creates performance bottlenecks.’
‘Something like provisioning an AWS bucket can become hugely complicated if you can’t find the right person holding the right credit card for the company’s AWS account. Mixing up the keys associated with different credit cards can create huge problems.
‘This is is what I mean by MLOps as an organizational problem. There are so many people charged with managing the different pieces of the deployment process, and you have to go and find them — then get their buy-in for whatever it is you’re doing.’
SATISH CHANDRA GUPTA: SLANG LABS
Ensuring there’s an agreement and shared understanding of your data model can make or break an ML application.
‘How many times have I encountered problems with a schema that broke the ML model? Sometimes when I am using curated data, the model works perfectly. But once I try and deploy it in the real world on live traffic, I get a very different result.
‘I can remember one recent example where I thought my model was broken, only to discover later from someone else that a data category I’d created wasn’t being populated.
‘You need to have the hygiene in place to ensure your data model is complete and working correctly, but that isn’t always present. You have to define the data model at the outset and get everyone’s agreement on which categories need to be populated with what.
People don’t always appreciate how important the issue is, but once they see the cost savings, they get on board.’
ELIZABETH CHABOT: DELOITTE
Not collecting the data you need, in the format you need, is another common issue.
‘When it comes to anything with analytics, we’re bad at knowing what data we need to collect before we build the thing.
‘Sometimes you don’t have the necessary historical data because you didn’t realize you’d need it later. In other cases, the collection processes weren’t set up correctly. Both of those are organizational issues.
‘Even if you have data, did you collect it properly? Is it high quality? Is there integrity? Is it reliable? What’s the lineage? And crucially — you sure that when you look at a data point collected two years ago, that it was collected the same way as data you’ve collected recently?
‘If you’re a startup using agile, you’ve probably changed procedures and processes and even updated your business model more than once since launch. Back at the start, and ML methodology wasn’t part of the picture.
‘You need historical data if you’re going to be engaged in any kind of predictive analytics.
The place to stary is by asking, ‘What do we want to do — and do we have the data to do it?’
Footer