From model ready to business ready: Connecting data to ROI

Originally published in PropertyCasualty360 – Nov 2017

When insurers decide to invest in predictive analytics, they’re often confronted with the same set of initial decisions.

Insurers of all sizes are investing in predictive analytics as a way to more accurately understand the risks their companies will undertake. However, not all of the carriers have secured buy-in across the organization. In fact, at the recent InsureTech Connect event, James (Jimi) Crawford, former NASA scientist and founder of Orbital Insight, theorized that the hardest part of selling analytics solutions to insurance companies was finding a person on the innovation team who had a keen understanding of the operations side of the business (or vice versa).

Orbital Insight is a company that utilizes satellite imagery to help governments and businesses create a better understanding of the world. Like other technology solution providers, they struggle to find people in insurance who understand both the business and analytic sides of their organizations.

This article looks to bridge the gap between the number crunchers adjudicating policies and those responsible for profitability, identifying the common verbiage across both sides to illuminate where common pitfalls may lie.

Accessing the right data

When insurers decide to invest in predictive analytics, they’re often confronted with the same set of initial decisions. One of the most common is deciding whether to build a model exclusively with in-house data, or to enlist a third-party data vendor to fill in the gaps. Developing an underwriting analytics model with the appropriate information is essential for success, as no amount of modeling expertise can make up for a lack of data.

Internal versus third-party decisions apply to both large and small companies alike. While large insurers like Travelers, USAA, and AIG have enough data to build an effective model, they regularly seek third-party data sources to augment their modeling efforts. Small and mid-sized carriers often lack the volume of data needed, and are more heavily reliant on third-party data. Sufficient data assets are the first hurdle to meet in order to achieve the business outcome goals of a predictive model. The volume, breadth, and depth of the data are all vital. Lack of “the right” data in a model will create issues like blind spots or sample bias that render model conclusions invalid, or at the very least, untrustworthy.

This isn’t a small issue. Those on the analytical side of an insurer will typically consider “more data” as better, but the business side may need more convincing in terms of the expense or the need to participate in contributory databases where many proprietary data assets reside. Interestingly, the very data that drives profitability and account longevity can be found in these databases. In these instances, the information that data scientists lack access to is often the data that drives the numbers the business side is most interested in.

Time management

Equally important to identifying whether an insurer has enough data, is data availability. Data scientists spend 50-80% of their time wrangling data instead of providing true modeling capabilities, according to The New York Times. Instead of locating data they could be spending time connecting business strategy to analytics project, overseeing management of the project or more.

As such, here are the steps organizations must take to ensure their data is conducive to predictive analytics initiatives, and will ultimately drive the highest ROI.

Technical Inventory

A technical inventory ensures the data is usable, properly identified and organized prior to building a predictive model. Profiling the data available, looking at information like a specific population and basic summary statistics, will help data scientists make initial assessments of whether the insurer has enough to work with.

Insurers will also want to check for structural validation (matching the fields). This can be done different ways but is often a visual grid with rows and columns that sifts out each uploaded data source file to locate what ‘necessary’ components are available, missing or align with what a specific predictive model needs. Many platforms support data uploads, but is there an opportunity for greater analysis? In the example below, the green check marks show a match between client data and system expectations. The yellow marks go further, taking data initially unaccounted for into consideration, providing an opportunity for more in-depth analysis. This tenant of structural data makes sure optimal insights are gleaned from the data.

Technical inventory establishes an initial level of data quality to reuse as the data scientist iterates through the process. Ongoing monitoring and performance evaluation of a solution is important, as data acquisition is a continuous process, not just for the initial model build.

Data Cleansing

Once technical inventory is complete, the next step is to understand what the data will support from the functional perspective. Does the data support the conclusions an insurer wants to see? On the business side, these conclusions may mean “how much more profitable will we be,” whereas the data scientist may determine they want to understand various effects of specific characteristics. This discussion of “why something impacts profitability” vs. “how much it costs” is an ongoing discussion between business and data teams.

However, both can agree that a model aimed at providing insights into new geographic territories or performance over time, must be given clean data with which to work.


For example, a common method to see both functional perspective and data cleanliness is a heat map. This groups and displays data along expected attributes, making it easy to identify data that may need further cleansing.

The key is to start simple. Don’t use all methods of data cleansing, but rather focus on which technique gives the most value based on the data.

Data lineage

As insurers go through the processes of technical and functional evaluations of data, they will encounter items that require re-processing of the data. For the business side, this should feel familiar. It is the equivalent of providing ongoing support for questions that come from the finance or audit departments.

During repeat the validation and cleansing, it’s important to maintain visibility into the data history to understand the changes that may occur over time. Retaining profiles for each iteration of data provides a history of data quality.

Even in the absence of a formal data governance, simply tracking data profiles will answer questions about what was considered, how many iterations of cleansing an insurer went through, and which version was used in the model data set. These are especially crucial if a carrier uses third-party data to supplement their own to avoid sample bias. This information must be requested of the vendor before using their data. The image below is an example of how Valen shows data lineage.

This list shows what files were received, who uploaded them, current status and details for where the files are within our data processing workflow — including final approvals of the data. Another important view of the data as an insurer iterates is the impact of each iteration on the resulting approved data set.

Although the journey to connecting business leaders with data leaders is still in its infancy, there are universal steps insurers should take to ensure data initiatives are positioned to drive the highest ROI. After all, the two sides of the business can certainly agree that business that’s both within regulatory guidelines and more profitable, should be the goal.

Following the steps above, and communicating within their framework, can help to bridge this departmental divide.