Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

data analysis

Rating
-
Sold
-
Pages
59
Uploaded on
09-05-2024
Written in
2023/2024

DATA SOURCES This data analytical process starts with the data. In the recent decade, there has been a tremendous increase in data sources and availability1 . As with other professionals, actuaries are faced with large amounts of available information almost instantaneously. The data volume and influx are not the only challenge for practitioners. A major difficulty is that the data, which sometimes must be studied in real time, comes from diverse sources, which lead to various data types and structures: 1 Open data source has even led to the notion of smart cities (Puiu, et al., 2016). “Smart cities are those that: adopt and promote innovative technology, processes and business models; use data with the intention of being more efficient and transparent; and increase citizen engagement to improve the prosperity and sustainability of cities” (Beswick, 2014 and 2015). 7 Copyright © 2019 Society of Actuaries Data from tracking customers’ transactions: In a survey, Rageso (2018) found that for medium and largesized European companies, online portal content and point of sales (with other transactional tracking tools) are main data sources. Another example of transaction tracking is found with credit card companies who may monitor their customers’ purchases in order to gather information to help detect fraudulent activity. Telematic data: Telematic (smart meter data) and GPS, which provide information on consumer driving habits and road usages, are also major sources of data for companies, particularly for motor insurance companies (Bellina, et al., 2018). Cellular telephone companies may study their subscribers’ calling patterns to offer tailored service (and fight possible competitors’ rates). Social media data: Social media and genetic sequencing are one of the fastest-growing new sources of data being used for analysis (EMC, 2015). For social media companies such as Twitter, Facebook, LinkedIn, and other clickstream, data itself is the primary product, and these companies’ values depend on the amount of data they can collect and host from their subscribers. These platforms also provide a source of information used by other companies to improve their level of service and create targeted advertisements. Some companies designed their own in-house search platform to attract more customers and improve their total sales (Rageso, 2018). In health care (and life insurance), massive volumes of patient data are generated continuously. Medical practitioners (and actuaries) need to analyze these patient-related data in order to improve patient care and satisfaction and manage population health, including the prevention of disease spread (AAA, 2018; Li, et al, 2013; Raghupathi and Raghupathi, 2014). The volume of data worldwide is growing at a rate of approximately 50% per year (Dhar, 2013). The Cloud offers storage solutions for the massive volume of data (Titus, 2017). Cloud storage and sharing allows companies to easily access, aggregate, analyze, and visualize all their data. This technology also gives the decision makers access to a variety of information and reports, helping them make better decisions in real time. Practitioners are developing their skills and ability to collect such types of data and extract relevant information (Sondergeld and Purushotham, 2019). Once the data is collected, its exploration and visualization are the first stages of the data analytics process, which will be briefly reviewed next. 2.3 DATA EXPLORATION AND VISUALIZATION “The whole point of data visualization is to provide [us] with insight about a set of data, [… ]. We may be using the visualization to tell a story [ ], or we may be using the visualization to see if there are discernable patterns in our data” (Campbell, M.P., 2017). The main goal of data visualization is to communicate data-related information clearly and effectively through graphical means. As such, data visualization is an important tool in data analytics. In the past, the most common visualization techniques were two-dimensional graphs such as scatterplots, histograms, boxplots, or pie charts. 8 Copyright © 2019 Society of Actuaries With the high volume of data available, visualization tools have improved. In the SOA (2016) call for essays on visualization, Houng (2016) shows how an interactive display is obtained using a slider. The slider, which is an alternative to a three-dimensional plot, helps the user experiment with different “what if” scenarios. In the same SOA call for essays, Hegstrom (2016) showed how a distribution of results can be displayed in an effective way, by using a strip chart or a violin plot. Finally, Shang (2016) used word Cloud and Geolocation to visualize social network data. For data with many dimensions of interest, traditional visualizations may not provide an effective display. Mortality data is an example of a high number of dimensions of interest (age, time, gender, country, etc.). In this case, improved visualization techniques such as heat maps, trajectory plots, and advanced projectionbased methods can be used. Heat maps (or alternatively surface plot) are used to describe the level of (surface) variation in a quantity connecting two variables, x and y. Heat maps are commonly used to illustrate mortality improvement rates and give a good overview of the age- and time-dependence of improvements (Brouhns, et al., 2005). An example of a heat map is shown in the Case study 2. Trajectory plots are not commonly used in the actuarial field but are very popular in areas such as physics. The idea is to plot the development of a variable as a function of time in the form of a trajectory with the current value of the variable on the x-axis, and the rate of change on the y-axis. Other commonly used methods for data exploration include the advanced projection-based methods such as the Principal Component Analysis (Jolliffe, 1986) and Multi-dimensional Scaling (Cox and Cox, 2001), which can also be used for visualization of high-dimensional data into a 2D space (Ghodsi, 2006; James, et al., 2013). A self-organizing map (SOM) is a neural network-based visualization method also used for dimensionality reduction. Shreck, et al. (2010) provides an extension review of dimension-reduction visualization techniques. Following this introductory section, Section 3 deals with data analytics techniques. Section 3: Data Analytics Techniques Machine learning encompasses a variety of techniques used to ultimately make predictions based on a dataset. Machine-learning techniques can be classified as supervised or unsupervised. Supervised learning is the most commonly used class of machine learning for applications and will be the most familiar class of machine-learning techniques to most actuaries. In these methods, a training dataset is used that has both an explanatory variable (or variables) and a response variable. The goal of the supervised learning technique is to predict the response variable from new input variables as accurately as possible. These techniques can be used for regression or classification. Some supervised learning algorithms that are used in practice are regression techniques (including general linearized models and generalized additive models), tree-based methods (including decision trees, bagging, random forests, and gradient boosting machines), and neural networks. Unsupervised learning refers to techniques used to find hidden structure or pattern within unlabeled data (EMC, 2015). A difference between supervised machine learning and traditional statistical modeling is that supervised machine learning prioritizes prediction rather than inference, which is the focus of statistical modeling. This means that the supervised machine learning algorithms lead to models that are better 9 Copyright © 2019 Society of Actuaries predictors but may be difficult to interpret. Shapiro (2000) is one of the first papers dealing with machinelearning methods with actuarial science applications. 3.1 SUPERVISED LEARNING 3.1.1 REGRESSION AND GENERALIZED LINEAR MODELS (GLMS) Generalized Linear Models (GLMs) were introduced by Nelder and Wedderburn (1972) as a generalization of the linear, the logistic, and the Poisson regressions. GLMs are generally considered a standard approach to many insurance modeling applications: they are used extensively in the insurance industry for modeling insurance claims and pricing insurance products (Schirmacher, 2016; Tevet, 2016; de Jong and Heller, 2008). While these are traditional statistical techniques, they are a form of supervised learning in the sense that the models use both an explanatory variable(s) and a response variable. These techniques are presented in numerous texts, including McCullagh and Nelder (1989) and Denuit, et al. (2007). In this method, a multiple linear regression model is generalized via a link function to predict variables that have non-normal distributions. This can be represented as

Show more Read less
Institution
Course

Content preview

Innovation and Technology


Emerging Data
with Actuarial Analytics
Techniques
Applications




July 2019




Emerging Data Analytics Techniques with
Actuarial Applications

, 2




MARIE-CLAIRE KOISSI PhD, Professor Actuarial Innovation & Technology
Actuarial Science Program Steering Committee
University of Wisconsin-Eau Claire SPONSOR

HERSCHEL DAY FSA, MAAA, Associate Professor
Actuarial Science Program
University of Wisconsin-Eau Claire

VICKI WHITLEDGE PhD, Professor
Actuarial Science Program
University of Wisconsin-Eau Claire




Caveat and Disclaimer

The opinions expressed and conclusions reached by the authors are their own and do not represent any official position or opinion of the Society of
Actuaries or its members. The Society of Actuaries makes no representation or warranty to the accuracy of the information


Copyright © 2019 by the Society of Actuaries. All rights reserved.




CONTENTS
Abstract .................................................................................................................................................................... 4

Executive Summary .................................................................................................................................................. 4

Section 1: Acknowledgments .................................................................................................................................... 5

Section 2: Introduction ............................................................................................................................................. 6
2.1 DATA ANALYTICS FRAMEWORK .......................................................................................................................... 6
2.2 DATA SOURCES .................................................................................................................................................... 8
2.3 DATA EXPLORATION AND VISUALIZATION ......................................................................................................... 9

Section 3: Data Analytics Techniques ...................................................................................................................... 10
3.1 SUPERVISED LEARNING ..................................................................................................................................... 10
3.1.1 REGRESSION AND GENERALIZED LINEAR MODELS (GLMS) ............................................................ 10
3.1.2 TREES ................................................................................................................................................. 11
3.1.3 NEURAL NETWORKS ......................................................................................................................... 13
3.1.4 PREDICTIVE MODELING .................................................................................................................... 14
3.2 UNSUPERVISED TECHNIQUES ........................................................................................................................... 14
3.2.1 PRINCIPAL COMPONENT ANALYSIS ................................................................................................. 14
3.2.2 CLUSTER ANALYSIS ............................................................................................................................ 14
3.2.3 GENETIC ALGORITHMS ..................................................................................................................... 15
3.2.4 NEURAL NETWORKS ......................................................................................................................... 16
3.3 OTHER DATA ANALYTICS TECHNIQUES ............................................................................................................ 16



Copyright © 2019 Society of Actuaries

, 3
3.3.1 MARKOV CHAIN MONTE CARLO (MCMC) SIMULATION .................................................................
16 3.3.2 BAYESIAN ANALYSIS ..........................................................................................................................
17

Section 4: Emerging Data Analytic Technologies ..................................................................................................... 18
4.1 LIFE: MACHINE LEARNING TECHNOLOGIES FOR MORTALITY RATE FORECASTING ....................................... 18
4.2 HEALTH CARE: MACHINE LEARNING TECHNOLOGIES FOR HEALTH CARE CLAIMS MODELING .................... 18
4.3 LIFE / NON-LIFE: MACHINE LEARNING TECHNOLOGIES FOR RESERVES......................................................... 19
4.4 NON-LIFE: MACHINE LEARNING TECHNOLOGIES FOR CLAIM MODELING ..................................................... 20
4.5 LIFE / NON-LIFE: MACHINE LEARNING TECHNOLOGIES FOR INSURANCE FRAUD AND OTHER
AREAS............................................................................................................................................................................ 20
4.6 SOME ACTUARIAL PACKAGES IN R AND PYTHON ............................................................................................ 21
4.6.1 SOME ACTUARIAL PACKAGES IN R ................................................................................................... 21
4.6.2 SOME PACKAGES IN PYTHON WITH ACTUARIAL APPLICATIONS .................................................... 22

Section 5: Case Studies ........................................................................................................................................... 24
5.1 CASE STUDY 1: CHAINLADDER IN R .................................................................................................................. 24
5.2 CASE STUDY 2: CLAIMS FREQUENCY IN MOTOR INSURANCE ......................................................................... 32
5.3 CASE STUDY 3: MORTALITY (LIFE INSURANCE) ................................................................................................ 37

Section 6: Conclusion .............................................................................................................................................. 43

References .............................................................................................................................................................. 44

Appendices ............................................................................................................................................................. 53
A. Appendix A: R-Code for Case Study 1
............................................................................................................... 53 B. Appendix B: R-Code for Case Study
2 ............................................................................................................... 54
C. Appendix C: R-Code for Case Study 3 ............................................................................................................... 57

About The Society of Actuaries ............................................................................................................................... 60




Emerging Data Analytics Techniques with
Actuarial Applications

Abstract
Data analytics strongly rely on data and available computing tools. Recent years have seen an increase in
data availability and volume. Advanced computational methods and machine-learning tools have been
developed to handle this continuous flow of valuable information. The aim of this research is to survey
emerging data analytics techniques and discuss their evolution and growing use in the actuarial profession.
Data analytics’ applications in life and non-life insurance will also be provided.




Executive Summary
Data analytics involves a set of tools and techniques used to extract meaningful information from a dataset
(SOA, 2012). It encompasses several disciplines such as actuarial science, statistics, computer science,
mathematics, and marketing. Recent years have seen an increase in data availability and volume, leading to
an explosion in the concept of “Big Data” (AAA, 2018).




Copyright © 2019 Society of Actuaries

, 4
Actuaries rely heavily on data to perform analysis, make general inferences, inform decisions, and guide
predictions. They have a long history in conducting data analysis in areas such as underwriting, claim
management, pricing, risk analysis, and auditing (Shapiro and Jain, 2003; SOA, 2012). In the past, data
analysis was mainly descriptive and actuaries predominantly used programs such as Excel (SOA, 2012,
Appendix G) and C++ (Pauza and Bellomo, 2014). Although descriptive analytics is in use today, it now
represents an initial step in a more complex and data-driven analysis. Recent studies predict substantial
changes in the analytical tools used by actuaries and other professionals (Sondergeld and Purushotham,
2019; Guo, 2003; Wedel and Kannan, 2016).

Advanced data analytics packages (such as SAS, SPSS, Matlab, R, and Python) allow the user to extract more
information from a dataset, make a diagnostic analysis, and use non-standard models to make relevant
predictions. This paper aims at surveying emerging data analytics techniques with potential actuarial
applications.

The remaining part of the paper is organized as follows: Section 1 acknowledges the contributions of this
report’s Project Oversight Group (POG). Section 2 deals with the change in data source and volume. This
section also reviews some of the data visualization techniques available to actuaries. In Section 3, we give a
brief overview of several data analytic techniques. In Section 4, we review some applications of emerging
data analytic technologies in Actuarial Sciences. We also briefly describe some open-source data analytic
software that have grown in use among actuaries. Section 5 deals with three cases studies in which we use
open-source technologies for actuarial computational work. A commentary of the findings is presented in
Section 6.



Section 1: Acknowledgments
The authors gratefully acknowledge the significant contributions made by the members of the Project
Oversight Group. Special thanks are due to Dale Hall, SOA Managing Director of Research, and Mervyn
Kopinsky, SOA Experience Studies Actuary, for their leadership in guiding the project. The authors would
like to thank Korrel Crawford, SOA Senior Research Administrator, for her effective coordination of the
project and her help in getting this report ready for publication. The authors also gratefully acknowledge
the Actuarial Innovation & Technology Steering Committee of the Society of Actuaries for providing funding
for this project.

Project Oversight Group Members:

Han (Henry) Chen, FSA, MAAA, FCIA

Andrew Harris, ASA

Clinton Rheal Innes, FSA, ACIA

Karen T. Jiang, FSA, CERA, MAAA

Michael Cletus Niemerg, FSA, MAAA

Zhen Yuan, FSA




Copyright © 2019 Society of Actuaries

Written for

Course

Document information

Uploaded on
May 9, 2024
Number of pages
59
Written in
2023/2024
Type
Class notes
Professor(s)
N/a
Contains
All classes

Subjects

$15.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
antonysila275

Get to know the seller

Seller avatar
antonysila275 Chamberlain College Of Nursing
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
2 year
Number of followers
0
Documents
224
Last sold
-
LEARNING CENTRE.

EXAMS, SUMMARY , NOTES QUESTION AND ANSWERS.

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions