What is the Machine Doing With Your Data? How Can You Trust It?

You interact with a lot of machine intelligence every day. Do you know what they are in fact doing with your data, how they make their predictions? How do you trust them?

As consumers, we have been conditioned to accept services as they are, not question the way those services work and to trust in those services even though they are great black boxes. Black boxes are, however, tricky, because by definition they are secretive, not just private. What is being built at the crossroads of ML and open source can work toward increasing your trust in machine learning data use.

Image by Ben White

We are generating and consuming more and more data each day. Our processing capacity and capabilities are also evolving in lockstep, where new machine learning technologies can delight users in new ways, whether or not it’s a better recommendation engine, a more optimized shopping experience, or more intelligent (cheaper!) pricing.

In line with more development in performance, we must also require more development in the inner workings of machine learning models.

There are two interesting dimensions to the democratization in machine learning. One is around increasing visibility into the codebase via open source and the other via transparency and explainability. Arguably, both are needed and important, but let’s look at those in practice and what implications they have.

Transparency and Explainability in ML

Machine learning is not new, but it is a high bar area to enter even as a software developer or data analyst. When training models with massive amounts of raw data, the use of such data can also invoke a lot of questions of mistrust around inherent biases and prejudices even if the intention was never there (just look at how we struggle with systemic racism).

Explainability is a key to understanding what data points were factors in the predictions the machine made, and how to spot any over or under representations in the dataset. The more complexity a domain has, the more important it is to understand fundamentals and be able to trace back what happened along the way to get to the result in front of you.

This is how EazyML, for example, solves this equation by having the machine “explain it to you”.

EazyML explaining a local prediction in a machine learning predictive model on job satisfaction.

Open Source as the Fundamental Resource

In addition to being a booming market and foundational building block of future software, open-source machine learning libraries have also become a cornerstone in trustworthy machine learning and artificial intelligence software.

There are many structures in commercial open source, for example, MorphL has publicly released its “ community edition “ e-commerce AI engine”. However, the basic tenet is that by publishing code into the public domain you can gain additional comfort in seeing the inner workings of the software and also address the issues of vendor lock-in, where anyone can fork the software and build on top of it.

Of course, there are countless other benefits beyond transparency, such as the open collaboration to just produce better software and more future proof technical models.

There are a lot of interesting tangents related to ML/AI, in what we are doing at Prifina around user-held and user-controlled data, and the emergence of open source tools for generating synthetic data. Both of them have the power to respect the users’ data and leave it aside when building the most sophisticated models and results to deliver value for the individual.

Footer