Most property insurers today still rely on a guy with a ladder and a camera on a stick to perform physical inspections
Most property insurers today still rely on a guy with a ladder and a camera on a stick to perform physical inspections and assess risk. But smart insurers are enlisting the help of AI researchers who have developed platforms that can evaluate thousands of publicly available images and other data points on the web to deliver a risk assessment within seconds.
“We make sure that the insurer can access that data very, very quickly, especially if it’s being used in a quote engine,” said said Ryan Kottenstette, CEO at Cape Analytics, a deep learning company that provides predictive risk analysis to companies that insure, lend, own or manage real property.
He added that, in under two seconds, the insurer gets back a list of features, such as the degree of tree overhang or the roof condition rated on a five-point scale.
Insurers that are late to the AI party should remember Kodak, once the world’s leading photography company that spiraled into irrelevance when it hesitated going digital.
Lloyd of London, the global underwriter marketplace, predicted in its 2018 Emerging Risk Report that the growing Internet of Things, with telematics, wearables and smart-home sensors, will transform the insurance industry in coming years. For now, AI’s impact is primarily in improving claims processing. But, already, it is beginning to identify, assess, and underwrite emerging risks in real-time.
Parsyl, a Colorado-based startup, helps insurers track the quality of perishable products as they move through the supply chain. Auto insurers, from Progressive to Geico, are using telematics to collect real-time driving data from vehicles, rewarding safe drivers with discounts and helping reconstruct accidents. Wearables such as fitness trackers and heart rate monitors may eventually help health insurers track and reward healthy habits such as regular exercise.
These new risk assessment services are the result of advances in machine learning, which allow algorithms trained on millions of images to spot various classes of risk, from overhanging trees to swimming pools, in the blink of an eye. That doesn’t make physical inspectors obsolete, but it can give insurers an instant read on what kinds of potential claims might be filed for a given property. For larger jobs, the systems can evaluate the general risk of developments or even neighborhoods with startling accuracy.
But peek behind the curtain and the real work for these new risk assessment services is in labeling huge volumes of data with which to train the AI systems. Just as a child learns to identify a tree as a tree by being told, computer vision algorithms must be trained to recognize a tree as a tree in a process called supervised learning.
Teams of workers painstakingly annotate millions of datapoints by hand, which are, in turn, fed to the algorithms. The more annotated data available to train the algorithms, the more accurate the machine-learning analysis will be. The differentiator is labeled training data.
“Training data is the life blood of this AI revolution, period,” said J.C. Arturo, CEO of Arturo, another company that unlocks value from the massive jump in aerial imagery using AI.
Computer vision has been transforming industries since 2012, when a breakthrough with networks of artificial neurons first made its use accurate enough for commercial applications. Since then, there has been an explosion of use cases, from weed detection to self-driving cars.
At the same time, there has been a flood of geospatial imagery thanks to increasingly cheap image sensors and a plethora of miniaturized satellites, some as small as a shoebox, orbiting the earth. The startup Planet Labs has more than 130 satellites, for example, that famously photograph nearly every place on earth at a resolution of 3 to 5 meters, every day.
The convergence of increasingly accurate computer vision algorithms and the abundance of geospatial imagery has allowed companies like Cape Analytics and Arturo to catalogue potential insurance risks for every property in a target market. A client can enter an address into these systems and get back a full report on potential risks, from overhanging trees to damaged roof tiles.
But it is not enough for a system to scan imagery, it needs to know what to look for in those images. And in order to teach a computer how to interpret features in an image, the computer needs to be trained with massive volumes of labeled imagery: overhanging trees must be outlined and labeled, damaged roof tiles must be outlined and labeled, swimming pools, shrubbery, streams and ponds, all need to be outlined and labeled, largely by hand, in thousands, hundreds of thousands, or even millions of images before the computer algorithms are competent enough to spot such features on their own.
It’s hard to label precisely at the quantities that are necessary to make models commercially viable. As companies build up a larger and larger volumes of accurately labeled data, that becomes their most valuable IP.
“We have our own proprietary labeled data set,” said Mr. Kottenstette of Cape Analytics.’ His company has zoomed in on property analysis because it touches so many industries, from insurance to real estate investment.
Cape Analytics does its own labeling with its own characteristic definitions and taxonomy. For many addresses in the United States, the company has property characteristics pre-computed and can provide customers with analysis in under two seconds. The insurer gets back a list of features, such as the degree of tree overhang or the roof condition rated on a five-point scale.
Cape Analytics maintains a database of historical claims contributed by its customers — millions of policy years in aggregate from across the country — and matches imagery that corresponds with the timeframe of each policy. It then looks at which of those properties ended up having a claim and whether any of the property characteristics correlate to a higher frequency of claims or a higher severity of a claim.
“You need high quality infrastructure that can that can scale to millions of homes,” Mr. Kottenstette said.
He said the Cape Analytics process starts with defining a “gold standard data set” with a group of in-house annotation experts to clearly define and document the characteristics they are seeking to identify. This gold standard is then used to train a workforce from outsourced labeling firms, which then scale up that dataset, he said.
“Once we feel good about the quality and quantity of data that’s been labeled, we use it to train our models in-house, followed by additional iterations and testing to get the models as performant as possible,” Mr. Kottenstette said.
Allstate Insurance uses specialized aircraft or drones to record imagery used to write policies for customers faster, or to look at damage more quickly after a catastrophe. Its auto insurance customers can send in photos from an accident scene to be analyzed by AI models and expedite claims. Orbital Insight and Flyreel are two more companies using AI and imagery to asses insurance risk.
How do these companies label their data? They all use data labeling platforms to manage the annotation process.
Like Cape Analytics, Arturo uses proprietary internal claim data from its clients for predictive analysis, calculating the probability and size of claims based on identifiable risk factors — saying, for example, that a 30 percent tree overhang in a certain market correlates to a 30 percent chance of a $20,000 claim according to historical data.
“Over time, being able to create high quality training data is going to be a need of any Fortune 1000 business and it’ll be a tool every CIO needs to have in their vendor set,” said Mr. Arturo.
Follow me on Twitter, where I regularly tweet about Data Science and Machine Learning.