I recently completed the Artificial Intelligence Product Manager Nanodegree Program on Udacity and I’d like to share a summary of everything I learned with you. This also includes bits from my experience as a technical product manager.
This all a huge dump from my mind, written from the first stroke to last on my keyboard so kindly excuse any details I may miss or depths I didn’t hit.
It would be great to start with “why” and what motivated me to complete this program. In the past year, I’ve been working as a full-time product manager, sitting at the intersection of engineering and business and it’s been fun. Solving problems is fun. However, I’d recently been thinking deeply about the future of technology and what turns it could take. No doubt, artificial intelligence will play a huge part in this future.
With this in mind, I set to learn how artificial intelligence could directly solve human problems which in turn would drive the growth of businesses. The AI for Product Managers Nanodegree program on Udacity seemed a good fit. The recommended study by Udacity is 2 months. I completed it in 3 weeks, you can too if you try it out. 😃
Along with building a real-world project proposal, the program focused on imparting knowledge of the following:
- What is AI and its use cases?
- When is utilizing Machine Learning relevant
- The business cases of AI solutions
- Machine learning models
- Data Annotation and Training
- Model assessment and model performance
- Business metrics assessment
- Designing AI products for longevity
In the following sections of this letter, I’ll give a succinct explanation of each of the above-listed points.
AI, short for artificial intelligence is the ability of machines to carry out actions that previously required human intelligence or knowledge. Simply put, if a machine is able to look at your face and tell if you’re smiling or frowning and how happy you seem, then that’s artificial intelligence.
AI is used in numerous cases, anything that previously required human effort could be augmented or replaced with AI. The key here is performance. Which is more performant, humans or machines? Advancements in the field have seen tremendous improvements in the performance of AI models where humans previously excelled. I expect the uptick to continue exponentially.
While AI is a body of work on intelligent machines and systems, Machine learning is a subset of artificial intelligence where machines are able to learn and make decisions based on data fed to the machine. AI and ML will be used interchangeably throughout this text.
An example use case of Machine learning is in identifying distinct defining patterns in images. This could be health X-rays, traffic signs, chemical color, substance visible quality, etc. These would normally require human effort, but with the computing power of machine learning systems, it’s done faster.
ML is currently employed in object recognition, audio/speech recognition, language detection amongst others.
You may wonder, why should I bother with the ML buzz when I can get someone to do the same thing for half the cost of the technology.
You are right.
Technology by itself enhances existing solutions or proffers solutions entirely impossible previously. AI is utilized where higher efficiency is required to replace an existing process. For example in sorting mail during transport, a single person could properly identify and sort packages at 20 packages a minute. However, an AI-assisted system would perform text recognition on the destination address on each package and sort the packages at 50 packages a minute. More than doubling performance.
If the introduction of artificial intelligence like any other technology doesn’t improve the performance of your system and/or business then you could look to other optimization methods.
What good is an AI system introduced to enhance a solution, if it doesn’t contribute to the improvement of a business objective? No good you must think too. A crucial part of developing AI solutions for businesses as a product manager is to understand what key objectives are affected by the solution. Both positively and negatively. These form the business case.
An AI solution could improve user experience but cost much more than the business can afford. Trade-offs have to be considered before deciding the usefulness of a solution.
What do you intend to achieve with an AI product or solution and how will this drive a business objective? These objectives are listed and the impact on business outcomes is evaluated.
Business objectives are specific and can be easily identified. Examples include:
- To improve user experience in the checkout process
- To increase conversion on the search page
- To reduce the number of call center agents required to assist customers.
These business objectives are beneficial to the business and/or users.
To use machine learning solutions, a model is required. A machine learning model comprises a series of complex instructions trained using to produce specific outcomes. A model takes an input and returns an output after running the input through algorithms to make complex decisions.
Excuse the french, a model is a system that is trained with data, given input, and it returns an output depending on its training and decision process.
Examples of ML models include speech to text models, object character recognition models, and computer vision models.
Models can either be built from scratch or a process called transfer learning is used to retrain an existing model. Transfer learning is a process whereby an existing trained model is retrained using the same model algorithms but with different data sets. In transfer learning, the expected inputs are similar and the predicted outcomes are based on the data used in retraining the model.
Transfer learning is used over developing a model from scratch for multiple reasons. Building a machine model from scratch requires more resources and technical complexities. Transfer learning systems on the other hand are offered by various 3rd party providers including Google and IBM. Setting up transfer learning models require less setup overhead. However, transfer learning comes with a deficiency in customizing and extending the capability of the model. Also, there is the issue in transfer learning models of cost and managing the model when huge datasets are fed through.
For the purpose of the nanodegree, transfer learning is used. Google AutoML is the platform of choice to generate a model to detect pneumonia in X-ray images, for the model development exercise. The AutoML Vision model is used to classify the images.
A machine learning model is to be created to differentiate between two classes of images, normal and pneumonia. When an image is fed into the system, it should return the class of the image with the level of confidence it has in its prediction.
A class is a decision outcome of the model. A model is fed an input (image) and a class is expected as output.
To achieve this, the model is to be trained with data from both classes and tested before use.
The following important data considerations are made when setting up a model.
The data must be unbiased. The same amount of each required outcome class is to be used in training the model. Bias in the data leads to overtraining or under-training of one class over the other. This is also reflected in the confidence of the model when making predictions.
The more the data class, the more the ML model is able to accurately predict the same class with high confidence.
For each class, all kinds of data within the class should be provided. All possible edge cases and forms of that kind of data should be provided. Also, these individual subset groups should also be unbiased and in large quantities to improve the confidence of the model in making predictions.
The model is only as accurate as the data it is trained with. Avoid using the wrong data in training the model. Wrong data will lead to confusion of the model thereby reducing its performance.
Depending on the model outcome, the data required for training may require annotation. Annotation is the process of properly assigning labels to uncategorized or scattered data. These labels are required in training the model. Other forms of data may already be annotated through other means and do not require annotation by physical methods or human input.
Annotation is done by humans or existing systems (possibly AI-powered too) to accurately label the training data with the required classes. Services like Appen, provide systems to annotate data.
After data is annotated and prepared with the right considerations, it is fed to the model to be trained.
Test data is set aside to test the performance of the model. This data isn’t used in training the model and is entirely new to the trained system. With this test data, the performance of the model can be accurately assessed. The test data is also required to be unbiased and should cover all test cases to ensure all outcomes and edge cases of the model are tested.
How well is the model able to predict the desired outcome? This is measured using various parameters. Two important measures are Precision and Recall.
Precision is the ability of a model to predict an outcome from all predicted outcomes. Recall is the ability of a model to identify the occurrence of a relevant outcome.
Precision is calculated as the ratio of true positives (correct predictions) to total predictions where Recall is the ratio of true positives (total correct predictions) to the total number of the class data (ie the total possible occurrence of the class — true positives + false negatives).
For a model with multiple classes, the Recall of the model is the average of all the recalls of its individual classes. This is the same for model Precision.
The recall and precision are important values in measuring model performance. For certain models, you would need a high precision value, for example, in ID verification systems. Whereas in X-ray scanning systems for detecting pneumonia, you would need to have a high recall value, as you would want more false negatives in the pool (true values but predicted as false). This way even patients with a slight chance of pneumonia are sent to the doctor for further evaluation instead of leaving them out.
Another metric for model evaluation is the F1 score which combines Recall and precision. The formula for F1 score is (2 * precision * recall)/(precision + recall)
.
Once an AI product or feature has been developed, it needs to be tested against defined business metrics. These metrics are formulated from the business cases stated earlier. The product metrics must be specific and measurable. Examples of good metrics are:
- Increase user acquisition by 10% month on month
- Reduce operating expenditure in customer service by 30%
- Improve the NPS score by 40%
Business metrics should be measured before and after deploying a model. A way to ascertain the impact of an ML system on a product is to conduct A/B tests. These tests are run on sampled live users. A competing portion (new model or process) is put in a test with an existing portion. The winner after evaluation against set metrics is used as the final version in production.
Like every other product, multiple factors are considered to ensure that multiple strategic intents defined by all stakeholders are met. Also, product development efforts must align with the high-level product vision.
In developing ML-powered products, the data is largely considered when building for the long term. The following questions come to mind:
How will the data change over time? What is the cost of acquiring the data? Does running the model in the long term align with the product vision or multiple changes have to occur at certain stages in the life of the product? What does the roadmap for the AI solutions look like?
These and all other product management questions have to be answered and all should be within the frame of the product vision.
Building out successful products requires a strong feedback loop from the customer and the business, informing the product. There should be adequate mechanisms in place to ensure this feedback is incorporated in every step of an AI product iteration. From data annotation, model training, and testing to model iteration for performance.
Human in the loop (HITL) is normally introduced in ML systems to serve as feedback mechanisms where an ML system is deficient. The data from the human is used in further training the ML system for improved performance.
Lastly, like I would say, “there is no one defined route/blueprint to building and shipping every product”, however, certain entities are common in all products. Pick out the ‘useful commons’, hone your process, track performance, and ship better!
Here’s to becoming better.
William.