There are a lot of things that contribute to a good story. Let’s explore them a bit:
Communication is the key to effectively accomplish something. When you are working as a data scientist, it becomes more important for you to communicate with different stakeholders as much as possible.
Usually, we can divide stakeholders into four categories:
i – Users — They are keen to see the best solution in terms of experience
ii – Domain experts — They are responsible for providing help with their domain expertise
iii – Executives — They are more interested in knowing about business and revenue
iv – Tech guys — They are mostly interested in the technical side of things
To communicate effectively with each of them you need to talk to them in their understandable way. Users will be more interested in the end solution and something that solves their pain. Executives and managers are more inclined towards revenue and related things. While the domain experts might be helping you with their expertise in the relevant field. You’ll need to leverage their knowledge to frame a good solution for the given problem.
Storytelling starts with the facts that people already know. It might be a problem that your users face daily, or it might be something that’s not so optimized and can be done intelligently. Maybe it’s something that can generate more revenue for your company.
When you start with the knowledge that people already know they are able to relate to it and understand it better. It’s because they have the first-hand experience of the problem.
Usually, technical concepts are abstract, vague, and complex for those who are not directly related to tech.
Storytelling comprises of two parts:
i ) — Facts that help you to solidify your arguments and make it easy for your audience to understand what’s going on behind the scenes.
ii ) — Narrative that you present in order to persuade your audience to take action and relate to the things presented in the story to what and how they think
So, it’s really necessary for you to present facts in a way that paints the picture of the solution for the intended segment of stakeholders.
You may have heard the saying that “A picture is worth a thousand words”. This is true. What you can convey to your audience in a lot of words can be said with just a single image in no time.
There are different graphs, figures, and visualizations that can give your audience a better idea of what’s going on in the data, what are the trends, and the general behavior of data.
Sometimes you might miss some useful information while analyzing data or applying different statistical techniques. Here visuals and graphs come into play. They help you to uncover the previously unnoticed areas, aspects, and insights of the data.
Visualizations are actually a chance for you to supplement your story with something that your audience can see but some people also mess it up. For example, if you visualize a lot of variables in one graph, your audience may find it too complex especially in the case of complex graphs. So, it’s always desired to keep visualizations simple and straightforward so that you leverage them to build your narrative.
Statistics can help you a lot in understanding the data and make sense of it. You can then deduce many facts out of that and use them for its better understanding.
Mainly, statistics can be divided into two types: Descriptive and Inferential. Descriptive statistics tries to describe the existing data while inferential statistics give you an overview of the relationship between different features or variables in data.
Usually, descriptive statistics allow us to get information about our immediate or available group of data while inferential statistics allow us to go beyond that using sampling to generalize a population based on one or many samples.
The importance of statistics according to the aspect of storytelling is that it tells you a lot about the data that’s readily available to you and then extend your generalization to the whole population whose properties or features might not be available. You can use this information to tell your audience about different insights found in the data and then base different hypotheses accordingly.
Machine learning models are usually said to be ‘black-box’ models. There is a performance vs. explainability tradeoff if you have worked with some advanced models. Linear models like linear, logistic regression and tree-based models are easy to explain while non-linear models are difficult to interpret and explain.
In the real world, no one would like to employ or use something which they don’t understand. Also, in some areas like banking, insurance, and medicine there are regulatory requirements for processes to be interpretable and explainable. Another aspect of explainability is the ultimate trust of different stakeholders. Everyone knows that if we have a good idea of how something works, it helps us to be confident to use and trust it.
The same goes for the machine learning models. People would be more interested in the techniques that make sense to them.
There are various techniques to interpret machine learning models. Some of the existing techniques are related to statistical inference that can help us in identifying key features and derive meaningful representations from our data.