Analytics & Big Data
Mar 18, 2016

Rethinking insurance fraud detection

March 18, 2016 - Insurance fraud is estimated to cost the economy more than $40 billion per year, with a significant cost burden on insurance companies. Common schemes of insurance fraud are illegitimate information and claims by individuals; premium diversion; asset diversion; fee churning via repeated commissions; and workers' compensation fraud by companies. Given that fraudulent claims are an extremely small fraction of all claims and usually well concealed, detecting and predicting insurance fraud has always been difficult, reactive, and time-consuming due to large amount of manual work needed. Today, the digital revolution, access to multiple channels and transactions, globalization, and increasingly sophisticated fraud tactics have combined to make fraud detection exponentially more challenging. To solve these challenges, insurance companies must rethink the insurance fraud detection process.

Entity resolution and network graph analytics
Advanced analytics techniques, such as machine learning, allow us to more effectively, efficiently, and speedily detect fraud compared to traditional manual-intensive methods. Using advanced technologies, harnessing big data, and employing network graph analytics along with machine learning techniques creates a powerful approach to detect fraud and helps address many of the challenges faced by insurance companies. Because fraud is often a planned and voluntary act resulting from the confluence of highly social enabling factors, such as incentives, opportunities, and means, fraud can be detected and tracked by analyzing the social network.

"Entity resolution" is the technique of identifying and connecting indicators of entities from various sources of data by associating and grouping. Network graph analytics allows for a comprehensive and innate analysis of an entity's linkages using sets of overlapping networks. These methods help to connect the disparate data sources about an entity and their network, and provide powerful and intuitive visualizations laying bare the connections and hidden patterns. While each insurance claim individually may look legitimate, when viewed in a broader network, fraudulent patterns become visible. Claimants observed across different transactions, lines of business, or online activities may share names, addresses, telephone numbers, identifiers, service providers, etc., which can help to link data to the same person or an organized group of fraudsters. For instance, a number of fraudulent claims may emerge when a group of claimants, without previously known relationships, are identified to be processing high-value claims from the same auto repair shop.


  • Linkage of information from several (often disparate) data sources, both internal and external. For example: multiple transactions and channels, demographics, personal information, financial, digital (such as email, web, social media), public data, etc
  • Two entities related to each other, implicitly (e.g., sharing a service provider) or explicitly (e.g., sharing the same IP address), can create a network
  • The data and linkages are depicted in network graphs as nodes (points) and edges (lines)
  • These networks can be very large and thus the process can start by analyzing the direct neighborhood of a node by linking, displaying, and identifying relationships, and creating new and powerful features (e.g., distance between two points, strength of relationship, etc.). Gradually, the analysis can be extended to each additional layer of network leading to additional relationships and features
  • These new and powerful features, extracted using entity resolution and network graph analytics, when combined yield a richer data for predicting fraud, which results in improved accuracy, prediction, and recall


  • Identification of previously undetected instances and organized networks of insurance fraud through powerful and intuitive visualization
  • Increased accuracy of predicting fraud through a richer feature set compensating for the challenges caused by the skewed distribution (small proportion of fraud cases in the data)
  • Higher precision (fewer false positives) in fraud detection, i.e., a narrower set of high-risk fraud cases identified without compromising accuracy. This enables insurance companies to optimize their fraud detection and mitigation efforts, resulting in decreased costs and increased recovery
  • Real-time fraud detection can be developed using real-time feeds from social media, etc
  • Overcome poor/incomplete data quality issues for all entities in the data, leading to better and faster claim processing

Author: Payel Chowdhury, Ph.D. - Chief Scientist, Analytics and Research, Chief Science Office