Test of Markdown

Chapter 3: Modeling loss severity

Zeinab Amin

September 15, 2016

3.1 Chapter preview

The traditional loss distribution approach to modeling aggregate losses
starts by separately fitting a frequency distribution to the number of
losses and a severity distribution to the size of losses. The estimated
aggregate loss distribution combines the loss frequency distribution and
the loss severity distribution by convolution. Discrete distributions
often referred to as counting or frequency distributions were used in
Chapter 2 to describe the number of events such as number of accidents
to the driver or number of claims to the insurer. Lifetimes, asset
values, losses and claim sizes are usually modeled as continuous random
variables and as such are modeled using continuous distributions, often
referred to as loss or severity distributions. Mixture distributions are
used to model phenomenon investigated in a heterogeneous population,
such as modelling more than one type of claims in liability insurance
(small frequent claims and large relatively rare claims). In this
chapter we explore the use of continuous as well as mixture
distributions to model the random size of loss. We present key
attributes that characterize continuous models and means of creating new
distributions from existing ones. In this chapter we explore the effect
of coverage modifications, which change the conditions that trigger a
payment, such as applying deductibles, limits, or adjusting for
inflation, on the distribution of individual loss amounts.

3.2 Continuous distributions for loss severity

3.2.1 Basic distributional quantities

In this section we calculate the basic distributional quantities:
moments, percentiles and generating functions.

Moments

Let $X$ be a continuous random variable with probability density
function (f_{X}(x)), the k-th raw moment of $X$, denoted by
$$mu_{k}^{‘}$$, is the expected value of the k-th power of $X$,
provided it exists. The first raw moment (mu_{1}^{‘}) is the mean of
$X$ usually denoted by (mu ). The formula for $mu_{k}^{‘}$ is given as

$$mu_{k}^{‘} = Eleft( X^{k} right) = int_{0}^{infty}{x^{k}f_{X}left( x right)text{dx}}$$.

The support of the random variable $X$ is assumed to be nonnegative
since actuarial phenomena are rarely negative.

The k-th central moment of $X$, denoted by $mu_{k}$, is the expected
value of the k-th power of the deviation of $X$ from its mean $mu$.
The formula for $mu_{k}$ is given as

$mu_{k} = Eleftlbrack {(X – mu)}^{k} rightrbrack = int_{0}^{infty}{left( x – mu right)^{k}f_{X}left( x right)text{dx}}$.

The second central moment $mu_{2}^{‘}$ defines the variance of $X$,
denoted by $sigma^{2}$. The square root of the variance is the standard
deviation $sigma$. A further characterization of the shape of the
distribution includes its degree of symmetry as well as its flatness
compared to the standard normal distribution. The ratio of the third
central moment to the cube of the standard deviation defines the
coefficient of skewness which is a measure of symmetry. A positive
coefficient of skewness indicates that the distribution is skewed to the
right (positively skewed). The ratio of the fourth central moment to the
fourth power of the standard deviation defines the coefficient of
kurtosis which is a measure of the heaviness of the tail distribution
relative to the standard normal curve. High kurtosis is associated with
a heavy tailed distribution.

Example 3.1 (SOA)

$$X$$ has a gamma distribution with mean 8 and skewness 1. Find the variance of ( X ).

Solution

The probability density function of $X$ is given by
$f_{X}left( x right) = frac{left( frac{x}{theta} right)^{alpha}}{xGammaleft( alpha right)}exp – left( frac{x}{theta} right)$
for $x > 0$*. *

If $alpha$ is an integer, then
$mu_{k}^{‘} = Eleft( X^{k} right) = int_{0}^{infty}{frac{1}{left( alpha – 1 right)!theta^{alpha}}x^{k + alpha – 1}e^{- frac{x}{theta}}text{dx}} = frac{left( k + alpha – 1 right)!}{left( alpha – 1 right)!}theta^{k}$.

Thus, $mu_{1}^{‘} = Eleft( X right) = alphatheta$,
$mu_{2}^{‘} = Eleft( X^{2} right) = left( alpha + 1 right)alphatheta^{2}$,
$mu_{3}^{‘} = Eleft( X^{3} right) = left( alpha + 2 right)left( alpha + 1 right)alphatheta^{3}$

and $Vleft( X right) = alphatheta^{2}$.

Skewness$ = frac{Eleftlbrack {(X – mu_{1}^{‘})}^{3} rightrbrack}{{Vleft( X right)}^{frac{3}{2}}} = frac{mu_{3}^{‘} – 3mu_{2}^{‘}mu_{1}^{‘} + 2{mu_{1}^{‘}}^{3}}{{Vleft( X right)}^{frac{3}{2}}} = frac{left( alpha + 2 right)left( alpha + 1 right)alphatheta^{3} – 3left( alpha + 1 right)alpha^{2}theta^{3} + 2alpha^{3}theta^{3}}{left( alphatheta^{2} right)^{frac{3}{2}}} = frac{2}{alpha^{frac{1}{2}}} = 1$.

Hence, $alpha = 4$. Since, $Eleft( X right) = alphatheta = 8$, then
$theta = 2$ and $Vleft( X right) = alphatheta^{2} = 16$.

Quantiles

Percentiles can also be used to describe the characteristics of the
distribution of $X$. The 100pth percentile of the distribution of $X$,
denoted by $pi_{p}$, is the value of $X$ which satisfies

$F_{X}left( {pi_{p}}^{-} right) leq p leq Fleft( pi_{p} right)$,
for $0 leq p leq 1$.

The 50-th percentile or the middle point of the distribution,
$pi_{0.5}$, is the median. Unlike discrete random variables,
percentiles of continuous variables are unique.

Example 3.2 (SOA)

Let $X$ be a continuous random variable with density function
$f_{X}left( x right) = theta e^{- theta x}$, for $x > 0$ and 0
elsewhere. If the median of this distribution is $frac{1}{3}$, find
$theta$.

Solution

$F_{X}left( x right) = 1 – e^{- theta x}$. Then,
$F_{X}left( pi_{0.5} right) = 1 – e^{- thetapi_{0.5}} = 0.5$. Thus,
$1 – e^{- frac{theta}{3}} = 0.5$ and $theta = 3ln2$.

The moment generating function

The moment generating function, denoted by $M_{X}left( t right)$
uniquely characterizes the distribution of $X$. While it is possible for
two different distributions to have the same moments and yet still
differ, this is not the case with the moment generating function. That
is, if two random variables have the same moment generating function,
then they have the same distribution. The moment generating is a real
function whose k-th derivative at zero is equal to the k-th raw
moment of $X$. The moment generating function is given by

$M_{X}left( t right) = Eleft( e^{text{tX}} right) = int_{0}^{infty}{e^{text{tx}}f_{X}left( x right)text{dx}}$,

for all $t$ for which the expected value exists.

Example 3.3 (SOA)

The random variable $X$ has an exponential distribution with mean
$frac{1}{b}$. It is found that $M_{X}left( – b^{2} right) = 0.2$.
Find $b$.

Solution

$$M_{X}left( t right) = Eleft( e^{text{tX}} right) = int_{0}^{infty}{e^{text{tx}}be^{- bx}text{dx}} = int_{0}^{infty}{be^{- xleft( b – t right)}text{dx}} = frac{b}{left( b – t right)}.$$

Then,
$M_{X}left( – b^{2} right) = frac{b}{left( b + b^{2} right)} = frac{1}{left( 1 + b right)} = 0.2$.
Thus, $b = 4$.

**Example 3.4 **

Let $X_{1}$, $X_{2}$, …, $X_{n}$ be independent
$text{Ga}left( alpha_{i},theta right)$ random variables.

Find the distribution of $S = sum_{i = 1}^{n}X_{i}$.

Solution

The moment generating function of $S$ is

$M_{S}left( t right) = text{ E}left( e^{text{tS}} right) = Eleft( e^{tsum_{i = 1}^{n}X_{i}} right) = Eleft( prod_{i = 1}^{n}e^{tX_{i}} right) = prod_{i = 1}^{n}{Eleft( e^{tX_{i}} right) = prod_{i = 1}^{n}{M_{X_{i}}left( t right)}}$.

The moment generating function of $X_{i}$ is
$M_{X_{i}}left( t right) = left( 1 – theta t right)^{- alpha_{i}}$.

Then,
$M_{S}left( t right) = prod_{i = 1}^{n}left( 1 – theta t right)^{- alpha_{i}} = left( 1 – theta t right)^{- sum_{i = 1}^{n}alpha_{i}}$,
indicating that
$Ssim Galeft( sum_{i = 1}^{n}alpha_{i},theta right)$.

By finding the first and second derivatives of $M_{S}left( t right)$
at zero, we can show that
$Eleft( S right) = left. frac{partial M_{S}left( t right)}{partial t} right|{t = 0} = alphatheta$
where $alpha = sum{i = 1}^{n}alpha_{i}$, and
$Eleft( S^{2} right) = left. frac{partial^{2}M_{S}left( t right)}{partial t^{2}} right|_{t = 0} = left( alpha + 1 right)alphatheta^{2}$.

Hence, $Vleft( S right) = alphatheta^{2}$.

Probability generating function

The probability generating function, denoted by $P_{X}left( z right)$,
also uniquely characterizes the distribution of $X$. It is defined as

$P_{X}left( z right) = Eleft( z^{X} right) = int_{0}^{infty}{z^{x}f_{X}left( x right)text{dx}}$,

for all $z$ for which the expected value exists.

We can also use the probability generating function to generate moments
of $X$. By taking the k-th derivative of $P_{X}left( z right)$ with
respect to $z$ and evaluate it at $z = 1$, we get

$Eleftlbrack Xleft( X – 1 right)ldotsleft( X – k + 1 right) rightrbrack$.

3.2.2 Continuous distributions for modeling loss severity

In this section we explain the characteristics of distributions suitable
for modeling severity of losses, including gamma, Pareto, Weibull and
generalized beta distribution of the second kind. Applications for which
each distribution may be used are identified.

The Gamma distribution

The gamma distribution is commonly used in modeling claim severity. The
traditional approach in modelling losses is to fit separate models for
claim frequency and claim severity. When frequency and severity are
modeled separately it is common for actuaries to use the Poisson
distribution for claim count and the gamma distribution to model
severity. An alternative approach for modelling losses that has recently
gained popularity is to create a single model for pure premium (average
claim cost) that will be described in Chapter 4.

The continuous variable $X$ is said to have the gamma distribution with
shape parameter $alpha$ and scale parameter $theta$ if its probability
density function is given by

$f_{X}left( x right) = frac{left( frac{x}{theta} right)^{alpha}}{xGammaleft( alpha right)}exp – left( frac{x}{theta} right)$
$x > 0, alpha > 0, theta > 0$.

Figures 1 and 2 demonstrate the effect of the shape and scale
parameters on the gamma density function.

Fig. (1) Gamma density functions with varying scale parameters

{width=”3.8915146544181978in”
height=”2.2083333333333335in”}

Fig. (2) Gamma density functions with varying shape parameters

{width=”3.8690748031496063in”
height=”2.138888888888889in”}

When $alpha = 1$ the gamma reduces to an exponential distribution and
when $alpha = frac{n}{2}$ and $theta = 2$ the gamma reduces to a
chi-square distribution with $n$ degrees of freedom. As we will see in
Section 3.6.2, the chi-square distribution is used extensively in
statistical hypothesis testing.

The distribution function of the gamma model is the incomplete gamma
function, denoted by $Gammaleft( frac{alpha;x}{theta} right)$, and
defined as

$F_{X}left( x right) = Gammaleft( frac{alpha;x}{theta} right) = frac{1}{Gammaleft( alpha right)}int_{0}^{frac{x}{theta}}t^{alpha – 1}e^{- t}text{dt}$
$alpha > 0, theta > 0$.

The $k$-th moment of the gamma distributed random variable for any
positive $k$ is given by

$Eleft( X^{k} right) = frac{theta^{k}Gammaleft( alpha + k right)}{Gammaleft( alpha right)}$,
$k > 0$.

The mean and variance are given by $Eleft( X right) = alphatheta$
and $Vleft( X right) = alphatheta^{2}$, respectively.

Since all moments exist for any positive $k$, the gamma distribution is
considered a light tailed distribution, which may not be suitable for
modeling risky assets as it will not provide a realistic assessment of
the likelihood of severe losses.

The Pareto distribution

The Pareto distribution, named after the Italian economist Vilfredo
Pareto (1843–1923), has many economic and financial applications. It is
a left skewed and heavy-tailed distribution which makes it suitable for
modeling income, high-risk insurance claims and severity of large
casualty losses. The survival function of the Pareto distribution which
decays slowly to zero was first used to describe the distribution of
income where a small percentage of the population holds a large
proportion of the total wealth. For extreme insurance claims, the tail
of the severity distribution (losses in excess of a threshold) can be
modelled using a Pareto distribution.

The continuous variable $X$ is said to have the Pareto distribution with
shape parameter $alpha$ and scale parameter $theta$ if its pdf is
given by

$f_{X}left( x right) = frac{alphatheta^{alpha}}{left( x + theta right)^{alpha + 1}}$
$x > 0, alpha > 0, theta > 0$.

Figures 3 and 4 demonstrate the effect of the shape and scale
parameters on the Pareto density function.

Fig. (3) Pareto density functions with varying scale parameters

{width=”3.8819444444444446in”
height=”2.129678477690289in”}

Fig. (4) Pareto density functions with varying shape parameters

{width=”3.8541666666666665in”
height=”2.1809372265966753in”}

The distribution function of the Pareto distribution is given by

$F_{X}left( x right) = 1 – left( frac{theta}{x + theta} right)^{alpha}$
$x > 0, alpha > 0, theta > 0$.

It can be easily seen that the hazard function of the Pareto
distribution is a decreasing function in $x$, another indication that
the distribution is heavy tailed.

The $k$-th moment of the Pareto distributed random variable exists, if
and only if, $alpha > k$. If $k$ is a positive integer then

$Eleft( X^{k} right) = frac{k!theta^{k}}{left( alpha – 1 right)cdotsleft( alpha – k right)}$
$alpha > k$.

The mean and variance are given by
$Eleft( X right) = frac{theta}{alpha – 1}$ for $alpha > 1$ and
$Vleft( X right) = frac{alphatheta^{2}}{left( alpha – 1 right)^{2}left( alpha – 2 right)}$
for $alpha > 2$,

respectively.

Example 3.5

The claim size of an insurance portfolio follows the Pareto distribution
with mean and variance of 40 and 1800 respectively. Find

i. The shape and scale parameters,

ii. The 95-th percentile of this distribution.

Solution

$Eleft( X right) = frac{theta}{alpha – 1} = 40$ and
$Vleft( X right) = frac{alphatheta^{2}}{left( alpha – 1 right)^{2}left( alpha – 2 right)} = 1800$.
By dividing the square of the first equation by the second we get
$frac{alpha – 2}{alpha} = frac{40^{2}}{1800}$. Thus,
$alpha = 18.02$ and $theta = 680.72$.

The 95-th percentile, $pi_{0.95}$, satisfies the equation
$F_{X}left( pi_{0.95} right) = 1 – left( frac{680.72}{pi_{0.95} + 680.72} right)^{18.02} = 0.95$.
Thus, $pi_{0.95} = 122.96$.

The Weibull distribution

The Weibull distribution, named after the Swedish physicist Waloddi
Weibull (1887-1979) is widely used in reliability, life data analysis,
weather forecasts and general insurance claims. Truncated data arise
frequently in insurance studies. The Weibull distribution is
particularly useful in modeling left-truncated claim severity
distributions. Weibull was used to model excess of loss treaty over
automobile insurance as well as earthquake inter-arrival times.

The continuous variable $X$ is said to have the Weibull distribution
with shape parameter $alpha$ and scale parameter $theta$ if its
probability density function is given by

$f_{X}left( x right) = frac{alpha}{theta}left( frac{x}{theta} right)^{alpha – 1}exp – left( frac{x}{theta} right)^{alpha}$
$x > 0, alpha > 0, theta > 0$.

Figures 5 and 6 demonstrate the effect of the shape and scale
parameters on the Weibull density function.

The distribution function of the Weibull distribution is given by

$F_{X}left( x right) = 1 – e^{- left( frac{x}{theta} right)^{alpha}}$
$x > 0, alpha > 0, theta > 0$.

It can be easily seen that the shape parameter $alpha$ describes the
shape of the hazard function of the Weibull distribution. The hazard
function is a decreasing function when $alpha < 1$, constant when
$alpha = 1$ and increasing when $alpha > 1$. This behavior of the
hazard function makes the Weibull distribution a suitable model for a
wide variety of phenomena such as weather forecasting, electrical and
industrial engineering, insurance modeling and financial risk analysis.

Fig. (5) Weibull density functions with varying scale parameters

{width=”4.082962598425197in”
height=”2.2083333333333335in”}

Fig. (6) Weibull density functions with varying shape parameters

{width=”4.118055555555555in”
height=”2.3484306649168856in”}

The $k$-th moment of the Weibull distributed random variable is given by

$Eleft( X^{k} right) = theta^{k}Gammaleft( 1 + frac{k}{alpha} right)$.

The mean and variance are given by

$Eleft( X right) = thetaGammaleft( 1 + frac{1}{alpha} right)$,

and

$Vleft( X right) = theta^{2}left{ Gammaleft( 1 + frac{2}{alpha} right) – leftlbrack Gammaleft( 1 + frac{1}{alpha} right) rightrbrack^{2} right}$,

respectively.

Example 3.6

Suppose that the probability distribution of the lifetime of AIDS
patients (in months) from the time of diagnosis is described by the
Weibull distribution with shape parameter 1.2 and scale parameter 33.33.

i. Find the probability that a randomly selected person from this
population survives at least 12 months,

ii. A random sample of 10 patients will be selected from this
population. What is the probability that at most two will die within
one year of diagnosis.

iii. Find the 99-th percentile of this distribution.

Solution

Let $X$ be the lifetime of AIDS patients (in months)

${Pleft( X geq 12 right) = S}_{X}left( 12 right) = e^{- left( frac{12}{33.33} right)^{1.2}} = 0.746$.

Let $Y$ be the number of patients who die within one year of diagnosis.
Then, $Ysim Binleft( 10, 0.254 right)$ and
$Pleft( Y leq 2 right) = 0.514$.

Let $pi_{0.99}$ denote the 99-th percentile of this distribution. Then,

$S_{X}left( pi_{0.99} right) = e^{- left( frac{pi_{0.99}}{33.33} right)^{1.2}} = 0.01$
and $pi_{0.99} = 118.99$.

**The Generalized Beta Distribution of the Second Kind **

The Generalized Beta Distribution of the Second Kind (GB2) was
introduced by Gary Venter (1983) in the context of insurance loss
modeling and by MacDonald (1984) as an income and wealth distribution.
It is a four-parameter very flexible distribution that can model
positive as well as negatively skewed distributions.

The continuous variable $X$ is said to have the GB2 distribution with
parameters $a$, $b$, $alpha$ and $beta$ if its probability density
function is given by

$f_{X}left( x right) = frac{ax^{aalpha – 1}}{b^{text{aα}}left( alpha,beta right)leftlbrack 1 + left( frac{x}{b} right)^{a} rightrbrack^{alpha + beta}}$
$x > 0, a,b,alpha,beta > 0$,

where the beta function $left( alpha,beta right)$ is defined as

$left( alpha,beta right) = int_{0}^{1}{t^{alpha – 1}left( 1 – t right)^{beta – 1}}text{dt}$.

The GB2 provides a model for heavy as well as light tailed data. It
includes the exponential, gamma, Weibull, Burr, Lomax, F, chi-square,
Rayleigh, lognormal and log-logistic as special or limiting cases. For
example, by setting the parameters $a = alpha = beta = 1$, then the
GB2 reduces to the log-logistic distribution. When $a = 1$ and
$beta rightarrow infty$, it reduces to the gamma distribution and
when $alpha = 1$ and $beta rightarrow infty$, it reduces to the
Weibull distribution.

The $k$-th moment of the GB2 distributed random variable is given by

$Eleft( X^{k} right) = frac{b^{k}left( alpha + frac{k}{a},beta – frac{k}{a} right)}{left( alpha,beta right)}$,
$k > 0$.

Earlier applications of the GB2 were on income data and more recently
have been used to model long-tailed claims data. GB2 was used to model
different types of automobile insurance claims, severity of fire losses
as well as medical insurance claim data.

3.3 Methods of creating new distributions

In this section we

i. understand connections among the distributions;

ii. give insights into when a distribution is preferred when compared to
alternatives;

iii. provide foundations for creating new distributions.

3.3.1 Functions of random variables and their distributions

In Section 3.2 we discussed some elementary known distributions. In this
section we discuss means of creating new parametric probability
distributions from existing ones. Let $X$ be a continuous random
variable with a known probability function $f_{X}(x)$ and distribution
function $F_{X}(x)$. Consider the transformation
$Y = gleft( X right)$, where $g(X)$ is a one-to-one transformation
defining a new random variable $Y$. We can use the distribution function
technique, the change-of-variable technique or the moment-generating
function technique to find the probability density function of the
variable of interest $Y$. In this section we apply the following
techniques for creating new families of distributions: (a)
Multiplication by a constant (b) Raising to a power, (c) exponentiation
and (d) mixing.

(a) Multiplication by a constant

If claim data show change over time then such transformation can be
useful to adjust for inflation. If the level of inflation is positive
then claim costs are rising, and if it is negative then costs are
falling. To adjust for inflation we multiply the cost $X$ by 1+
inflation rate (negative inflation is deflation). To account for
currency impact on claim costs we also use a transformation to apply
currency conversion from a base to a counter currency.

Consider the transformation $Y = cX$, where $c > 0$, then the
distribution function of $Y$ is given by

$F_{Y}left( y right) = Pleft( Y leq y right) = Pleft( cX leq y right) = Pleft( X leq frac{y}{c} right) = F_{X}left( frac{y}{c} right)$.

Hence, the probability function of interest $f_{Y}(y)$ can be written as

$f_{Y}left( y right) = frac{1}{c}f_{X}left( frac{y}{c} right)$.

Suppose that $X$ has a parametric distribution and define a rescaled
version $Y = cX$, $c > 0$. If $Y$ is in the same parametric
family then the distribution is said to be a scale distribution. The
scale parameter has the following features:

i. The parameter is changed by multiplying by $c$;

ii. All other parameter remain unchanged.

Example 3.7 (SOA)

The aggregate losses of Eiffel Auto Insurance are denoted in euro
currency and follow a Lognormal distribution with $mu = 8$ and
$sigma = 2$. Given that 1 euro $=$ 1.3 dollars, find the set of
lognormal parameters, which describe the distribution of Eiffel’s losses
in dollars?

Solution

Let $X$ and $Y$ denote the aggregate losses of Eiffel Auto Insurance in
euro currency and dollars respectively. Then, $Y = 1.3X$.

$F_{Y}left( y right) = Pleft( Y leq y right) = Pleft( 1.3X leq y right) = Pleft( X leq frac{y}{1.3} right) = F_{X}left( frac{y}{1.3} right)$.

$X$ follows a lognormal distribution with parameters $mu = 8$ and
$sigma = 2$. The probability density function of $X$ is given by

$f_{X}left( x right) = frac{1}{text{xσ}sqrt{2pi}}e^{- frac{1}{2}left( frac{lnx – mu}{sigma} right)^{2}}$,
for $x > 0$.

Then, the probability density function of interest $f_{Y}(y)$ is

$f_{Y}left( y right) = frac{1}{1.3}f_{X}left( frac{y}{1.3} right) = frac{1}{1.3}frac{1.3}{text{yσ}sqrt{2pi}}e^{- frac{1}{2}left( frac{lnleft( y/1.3 right) – mu}{sigma} right)^{2}} = frac{1}{text{yσ}sqrt{2pi}}e^{- frac{1}{2}left( frac{lny – left( ln1.3 + mu right)}{sigma} right)^{2}}$.

Then $Y$ follows a lognormal distribution with parameters
$ln1.3 + mu = 8.26$ and $sigma = 2.00$.

Example 3.8

Demonstrate that the gamma distribution is a scale distribution.

Solution

Let $Xsim Ga(alpha,theta)$ and $Y = cX$, then
$f_{Y}left( y right) = frac{1}{c}f_{X}left( frac{y}{c} right) = frac{left( frac{y}{text{cθ}} right)^{alpha}}{yGammaleft( alpha right)}exp – left( frac{y}{text{cθ}} right)$.

We can see that $Ysim Ga(alpha,ctheta)$ indicating that gamma is a
scale distribution and $theta$ is a scale parameter.

(b) Raising to a power

In the previous section we have talked about the flexibility of the
Weibull distribution in fitting reliability data. Looking to the origins
of the Weibull distribution, we recognize that the Weibull is a power
transformation of the exponential distribution. This is an application
of another type of transformation which involves raising the random
variable to a power.

Consider the transformation $Y = X^{tau}$, where $tau > 0$, then the
distribution function of $Y$ is given by

$F_{Y}left( y right) = Pleft( Y leq y right) = Pleft( X^{tau} leq y right) = Pleft( X leq y^{frac{1}{tau}} right) = F_{X}left( y^{frac{1}{tau}} right)$.

Hence, the probability function of interest $f_{Y}(y)$ can be written as

$f_{Y}(y) = frac{1}{tau}{y^{frac{1}{tau} – 1}f}_{X}left( y^{frac{1}{tau}} right)$.

On the other hand, if $tau < 0$, then the distribution function of $Y$
is given by

$F_{Y}left( y right) = Pleft( Y leq y right) = Pleft( X^{tau} leq y right) = Pleft( X geq y^{frac{1}{tau}} right) = 1 – F_{X}left( y^{frac{1}{tau}} right)$,

and

$f_{Y}(y) = left| frac{1}{tau} right|{y^{frac{1}{tau} – 1}f}_{X}left( y^{frac{1}{tau}} right)$.

Example 3.9

We assume that $X$ follows the exponential distribution with mean
$theta$ and consider the transformed variable $Y = X^{tau}$. Show that
$Y$ follows the Weibull distribution when $tau$ is positive and
determine the parameters of the Weibull distribution.

Solution

$f_{X}(x) = frac{1}{theta}e^{- frac{x}{theta}}$, $x > 0$.

$f_{Y}left( y right) = frac{1}{tau}{y^{frac{1}{tau} – 1}f}_{X}left( y^{frac{1}{tau}} right) = frac{1}{text{τθ}}y^{frac{1}{tau} – 1}e^{- frac{y^{frac{1}{tau}}}{theta}} = frac{alpha}{beta}left( frac{y}{beta} right)^{alpha – 1}e^{- left( frac{y}{beta} right)^{alpha}}$.

where $alpha = frac{1}{tau}$ and $beta = theta^{tau}$. Then, $Y$
follows the Weibull distribution with shape parameter $alpha$ and scale
parameter $beta$.

The normal distribution is a very popular model for a wide number of
applications and when the sample size is large, it can serve as an
approximate distribution for other models. If the random variable $X$
has a normal distribution with mean $mu$ and variance $sigma^{2}$,
then $Y = e^{X}$ has lognormal distribution with parameters $mu$ and
$sigma^{2}$. The lognormal random variable has a lower bound of zero,
is positively skewed and has a long right tail. A lognormal distribution
is commonly used to describe distributions of financial assets such as
stock prices. It is also used in fitting claim amounts for automobile as
well as health insurance. This is an example of another type of
transformation which involves exponentiation.

Consider the transformation $Y = e^{X}$, then the distribution function
of $Y$ is given by

$F_{Y}left( y right) = Pleft( Y leq y right) = Pleft( e^{X} leq y right) = Pleft( X leq lny right) = F_{X}left( text{lny} right)$.

Hence, the probability function of interest $f_{Y}(y)$ can be written as

$f_{Y}(y) = frac{1}{y}f_{X}left( text{lny} right)$.

Example 3.10 (SOA)

$X$ has a uniform distribution on the interval $(0, c)$. $Y = e^{X}$.
Find the distribution of $Y$.

Solution

$F_{Y}left( y right) = Pleft( Y leq y right) = Pleft( e^{X} leq y right) = Pleft( X leq lny right) = F_{X}left( text{lny} right)$.

Then,
$f_{Y}left( y right) = frac{1}{y}f_{X}left( text{lny} right) = frac{1}{text{cy}}$.
Since $0 < x < c$, then $1 < y < e^{c}$.

3.3.2 Mixture distributions

Mixture distributions represent a useful way of modelling data that are
drawn from a heterogeneous population. This parent population can be
thought to be divided into multiple subpopulations with distinct
distributions.

Finite mixture distributions

Two-point mixture

If the underlying phenomenon is diverse and can actually be described as
two phenomena representing two subpopulations with different modes, we
can construct the two point mixture random variable $X$. Given random
variables $X_{1}$ and $X_{2}$, with probability density functions
$f_{X_{1}}left( x right)$ and $f_{X_{2}}left( x right)$
respectively, the probability function of $X$ is the weighted average of
the component probability functions $f_{X_{1}}left( x right)$ and
$f_{X_{2}}left( x right)$. The probability density function and
distribution function of $X$ are given by

$f_{X}left( x right) = af_{X_{1}}left( x right) + left( 1 – a right)f_{X_{2}}left( x right)$,

and

$F_{X}left( x right) = aF_{X_{1}}left( x right) + left( 1 – a right)F_{X_{2}}left( x right)$,

for $0 < a < 1$, where the mixing parameters $a$ and $(1 – a)$ represent
the proportions of data points that fall under each of the two
subpopulations respectively. This weighted average can be applied to a
number of other distribution related quantities. The k-th moment and
moment generating function of $X$ are given by

$Eleft( X right) = aEleft( X_{1} right) + left( 1 – a right)Eleft( X_{2} right)$,

and

$M_{X}left( t right) = aM_{X_{1}}left( t right) + left( 1 – a right)M_{X_{2}}left( t right)$,

respectively.

Example 3.11 (SOA)

The distribution of the random variable $X$ is an equally weighted
mixture of two Poisson distributions with parameters $lambda_{1}$ and
$lambda_{2}$ respectively. The mean and variance of $X$ are 4 and 13
respectively. Determine $Pleft( X > 2 right)$.

Solution

$Eleft( X right) = 0.5lambda_{1} + 0.5lambda_{2} = 4$;

$Eleft( X^{2} right) = 0.5left( lambda_{1} + lambda_{1}^{2} right) + 0.5left( lambda_{2} + lambda_{2}^{2} right) = 13 + 16$;

Simplifying the two equations we get $lambda_{1} + lambda_{2} = 8$ and
$lambda_{1}^{2} + lambda_{2}^{2} = 50$. Then, the parameters of the
two Poisson distributions are 1 and 7.

$Pleft( X > 2 right) = 0.5Pleft( X_{1} > 2 right) + 0.5Pleft( X_{2} > 2 right) = 0.05$.

k-point mixture

In case of finite mixture distributions, the random variable of interest
$X$ has a probability $p_{i}$ of being drawn from homogeneous
subpopulation $i$, where $i = 1,2,ldots,k$ and $k$ is the initially
specified number of subpopulations in our mixture. The mixing parameter
$p_{i}$ represents the proportion of observations from subpopulation
$i$. Consider the random variable $X$ generated from $k$ distinct
subpopulations, where subpopulation $i$ is modeled by the continuous
distribution $f_{X_{i}}left( x right)$. The probability distribution
of $X$ is given by

$f_{X}left( x right) = sum_{i = 1}^{k}{p_{i}f_{X_{i}}left( x right)}$,
${0 < p}{i} < 1$, $sum{i = 1}^{k}{p_{i} = 1}$.

This model is often referred to as a finite mixture or a $k$ point
mixture. The distribution function, $k$-th moment and moment generating
functions of the $k$-th point mixture are given as

$F_{X}left( x right) = sum_{i = 1}^{k}{p_{i}F_{X_{i}}left( x right)}$,

$Eleft( X^{k} right) = sum_{i = 1}^{k}{p_{i}Eleft( X_{i}^{k} right)}$,

$M_{X}left( t right) = sum_{i = 1}^{k}{p_{i}M_{X_{i}}left( t right)}$,

respectively.

Example 3.12 (SOA)

$Y_{1}$ is a mixture of $X_{1}$ and $X_{2}$ with mixing weights $a$ and
$(1 – a)$.

$Y_{2}$ is a mixture of $X_{3}$ and $X_{4}$ with mixing weights $b$ and
$(1 – b)$.

$Z$ is a mixture of $Y_{1}$ and $Y_{2}$ with mixing weights $c$ and
$(1 – c)$.

Show that $Z$ is a mixture of $X_{1}$, $X_{2}$, $X_{3}$ and $X_{4}$, and
find the mixing weights.

Solution

$f_{Y_{1}}left( x right) = af_{X_{1}}left( x right) + left( 1 – a right)f_{X_{2}}left( x right)$,

$f_{Y_{2}}left( x right) = bf_{X_{3}}left( x right) + left( 1 – b right)f_{X_{4}}left( x right)$,

$f_{Z}left( x right) = cf_{Y_{1}}left( x right) + left( 1 – c right)f_{Y_{2}}left( x right)$,

$$f_{Z}left( x right) = cleftlbrack af_{X_{1}}left( x right) + left( 1 – a right)f_{X_{2}}left( x right) rightrbrack + left( 1 – c right)leftlbrack bf_{X_{3}}left( x right) + left( 1 – b right)f_{X_{4}}left( x right) rightrbrack$$

$= caf_{X_{1}}left( x right) + cleft( 1 – a right)f_{X_{2}}left( x right) + left( 1 – c right)bf_{X_{3}}left( x right) + (1 – c)left( 1 – b right)f_{X_{4}}left( x right)$.

Then, $Z$ is a mixture of $X_{1}$, $X_{2}$, $X_{3}$ and $X_{4}$, with
mixing weights $text{ca}$, $cleft( 1 – a right)$,
$left( 1 – c right)b$ and $(1 – c)left( 1 – b right)$.

Continuous mixture

A mixture with a very large number of subpopulations ($k$ goes to
infinity) approaches an infinite mixture, often referred to as a
continuous mixture. In a continuous mixture, subpopulations are not
distinguished by a discrete mixing parameter but by a continuous
variable $theta$, where $theta$ plays the role of $p_{i}$ in the
finite mixture. Consider the random variable $X$ with a distribution
depending on a parameter $theta$, where $theta$ itself is a continuous
random variable. This description yields the following model for $X$

$f_{X}left( x right) = int_{0}^{infty}{f_{X}left( xleft| theta right. right)gleft( theta right)}text{dθ}$,

where $f_{X}left( xleft| theta right. right)$ is the conditional
distribution of $X$ at a particular value of $theta$ and
$gleft( theta right)$ is the probability statement made about the
unknown parameter $theta$, known as the prior distribution of $theta$
(the prior information or expert opinion to be used in the analysis).

This model is often referred to as an infinite mixture or a continuous
mixture. The distribution function, $k$-th moment and moment generating
functions of the continuous mixture are given as

$F_{X}left( x right) = int_{0}^{infty}{F_{X}left( xleft| theta right. right)gleft( theta right)}text{dθ}$,

$Eleft( X^{k} right) = int_{0}^{infty}{Eleft( X^{k}left| theta right. right)gleft( theta right)}text{dθ}$,

$M_{X}left( t right) = Eleft( e^{text{tX}} right) = int_{0}^{infty}{Eleft( e^{text{tx}}left| theta right. right)gleft( theta right)}text{dθ}$,

respectively.

The $k$-th moments of the mixture distribution can be rewritten as

$Eleft( X^{k} right) = int_{0}^{infty}{Eleft( X^{k}left| theta right. right)gleft( theta right)}dtheta = Eleftlbrack Eleft( X^{k}left| theta right. right) rightrbrack$.

In particular the mean and variance of $X$ are given by

$Eleft( X right) = Eleftlbrack Eleft( Xleft| theta right. right) rightrbrack$
and
$Vleft( X right) = Eleftlbrack Vleft( Xleft| theta right. right) rightrbrack + Vleftlbrack Eleft( Xleft| theta right. right) rightrbrack$.

Example 3.13 (SOA)

$X$ has a binomial distribution with a mean of $100q$ and a variance of
$100qleft( 1 – q right)$ and $q$ has a beta distribution with
parameters $a = 3$ and $b = 2$. Find the unconditional mean and variance
of .

Solution

$Eleft( q right) = frac{a}{a + b} = frac{3}{5}$ and
$Eleft( q^{2} right) = frac{aleft( a + 1 right)}{left( a + b right)left( a + b + 1 right)} = frac{2}{5}$.

$Eleft( X right) = Eleftlbrack Eleft( Xleft| q right. right) rightrbrack = Eleft( 100q right) = 100Eleft( q right) = 60$,

$$Vleft( X right) = Eleftlbrack Vleft( Xleft| q right. right) rightrbrack + Vleftlbrack Eleft( Xleft| q right. right) rightrbrack = Eleftlbrack 100qleft( 1 – q right) rightrbrack + Vleft( 100q right)$$

$= 100Eleft( q right) – 100Eleft( q^{2} right) + 100^{2}Vleft( q right) = 420$.

Exercise 3.14 (SOA)

Claim sizes, , are uniform on for each policyholder. varies by
policyholder according to an exponential distribution with mean 5. Find
the unconditional distribution, mean and variance of .

Solution

The conditional distribution of $X$ is
$f_{X}left( left. x right|theta right) = frac{1}{10}$ for
$theta < x < theta + 10$.

The prior distribution of $theta$ is
$gleft( theta right) = frac{1}{5}e^{- frac{theta}{5}}$ for
$0 < theta < infty$.

The conditional mean and variance of $X$ are given by

$Eleft( left. X right|theta right) = frac{theta + theta + 10}{2} = theta + 5$
and
$Vleft( left. X right|theta right) = frac{leftlbrack left( theta + 10 right) – theta rightrbrack^{2}}{12} = frac{100}{12}$,
respectively.

Hence, the unconditional mean and variance of $X$ are given by

$Eleft( X right) = Eleftlbrack Eleft( Xleft| theta right. right) rightrbrack = Eleft( theta + 5 right) = Eleft( theta right) + 5 = 5 + 5 = 10$,
and

$Vleft( X right) = Eleftlbrack Vleft( Xleft| theta right. right) rightrbrack + Vleftlbrack Eleft( Xleft| theta right. right) rightrbrack = Eleft( frac{100}{12} right) + Vleft( theta + 5 right) = 8.33 + Vleft( theta right) = 33.33$.

The unconditional distribution of $X$ is
$f_{X}left( x right) = int_{}^{}{f_{X}left( left. x right|theta right)gleft( theta right)text{dθ}}$.

{width=”3.2847222222222223in”
height=”2.0650273403324584in”}

$$f_{X}left( x right) = left{ begin{matrix}
int_{0}^{x}{frac{1}{50}e^{- frac{theta}{5}}dtheta = frac{1}{10}left( 1 – e^{- frac{x}{5}} right)} & 0 leq x leq 10, \
int_{x – 10}^{x}{frac{1}{50}e^{- frac{theta}{5}}text{dθ}} = frac{1}{10}left( e^{- frac{left( x – 10 right)}{5}} – e^{- frac{x}{5}} right) & 10 < x < infty. \
end{matrix} right. $$

3.4 Coverage Modifications

In this section we evaluate the impacts of coverage modifications: a)
deductibles, b) policy limit, c) coinsurance and inflation on insurer’s
costs.

3.4.1 Policy deductibles

Under an ordinary deductible policy, the insured (policyholder) agrees
to cover a fixed amount of an insurance claim before the insurer starts
to pay. This fixed expense paid out of pocket is called the deductible
and often denoted by $d$. The insurer is responsible for covering the
loss $X$ less the deductible $d$. Depending on the agreement, the
deductible may apply to each covered loss or to a defined benefit period
(month, year, etc.)

Deductibles eliminate a large number of small claims, reduce costs of
handling and processing these claims, reduce premiums for the
policyholders and reduce moral hazard. Moral hazard occurs when the
insured takes more risks, increasing the chances of loss due to perils
insured against, knowing that the insurer will incur the cost (e.g. a
policyholder with collision insurance may be encouraged to drive
recklessly). The larger the deductible, the less the insured pays in
premiums for an insurance policy.

Let $X$ denote the loss incurred to the insured and $Y$ denote the
amount of paid claim by the insurer. Speaking of the benefit paid to the
policyholder, we differentiate between two variables: The payment per
loss and the payment per payment. The payment per loss variable, denoted
by $Y^{L}$, includes losses for which a payment is made as well as
losses less than the deductible and hence is defined as

$$Y^{L} = left( X – d right)_{+} = left{ begin{matrix}
0 & X leq d, \
X – d & X > d. \
end{matrix} right. $$

$Y^{L}$ is often referred to as left censored and shifted variable
because the values below $d$ are not ignored and all losses are shifted
by a value $d$.

On the other hand, the payment per payment variable, denoted by $Y^{P}$,
is not defined when there is no payment and only includes losses for
which a payment is made. The variable is defined as

$$Y^{P} = left{ begin{matrix}
text{Undefined} & X leq d, \
X – d & X > d. \
end{matrix} right. $$

$Y^{P}$ is often referred to as left truncated and shifted variable or
excess loss variable because the claims smaller than $d$ are not
reported and values above $d$ are shifted by $d$.

Even when the distribution of $X$ is continuous, the distribution of
$Y^{L}$ is partly discrete and partly continuous. The discrete part of
the distribution is concentrated at $Y = 0$ (when $X leq d$) and the
continuous part is spread over the interval $Y > 0$ (when $X > d$). For
the discrete part, the probability that no payment is made is the
probability that losses fall below the deductible; that is,
$Pleft( Y^{L} = 0 right) = Pleft( X leq d right) = F_{X}left( d right)$.
Using the transformation $Y^{L} = X – d$ for the continuous part of the
distribution, we can find the probability density function of $Y^{L}$
given by

$$f_{Y^{L}}left( y right) = left{ begin{matrix}
F_{X}left( d right) & y = 0, \
f_{X}left( y + d right) & y > 0. \
end{matrix} right. $$

We can see that the payment per payment variable is the payment per loss
variable conditioned on the loss exceeding the deductible; that is,
$Y^{P} = left. Y^{L} right|X > d$. Hence, the probability density
function of $Y^{P}$ is given by

$f_{Y^{P}}left( y right) = frac{f_{X}left( y + d right)}{1 – F_{X}left( d right)}$
, for $y > 0$.

Accordingly, the distribution functions of $Y^{L}$and $Y^{P}$ are given
by

$$F_{Y^{L}}left( y right) = left{ begin{matrix}
F_{X}left( d right) & y = 0, \
F_{X}left( y + d right) & y > 0. \
end{matrix} right. $$

and

$F_{Y^{P}}left( y right) = frac{F_{X}left( y + d right) – F_{X}left( d right)}{1 – F_{X}left( d right)}$
, for $y > 0$.

respectively.

The raw moments of $Y^{L}$ and $Y^{P}$ can be found directly using the
probability density function of $X$ as follows

$Eleftlbrack left( Y^{L} right)^{k} rightrbrack = int_{d}^{infty}left( x – d right)^{k}f_{X}left( x right)text{dx}$,

and

$Eleftlbrack left( Y^{P} right)^{k} rightrbrack = frac{int_{d}^{infty}left( x – d right)^{k}f_{X}left( x right)text{dx}}{{1 – F}{X}left( d right)} = frac{Eleftlbrack left( Y^{L} right)^{k} rightrbrack}{{1 – F}{X}left( d right)}$
,

respectively.

We have seen that the deductible $d$ imposed on an insurance policy is
the amount of loss that has to be paid out of pocket before the insurer
makes any payment. The deductible $d$ imposed on an insurance policy
reduces the insurer’s payment. The loss elimination ratio (LER) is the
percentage decrease in the expected payment of the insurer as a result
of imposing the deductible. LER is defined as

$LER = frac{Eleft( X right) – Eleft( Y^{L} right)}{Eleft( X right)}$.

A little less common type of policy deductible is the Franchise
deductible. The Franchise deductible will apply to the policy in the
same way as ordinary deductible except that when the loss exceeds the
deductible $text{d }$the full loss is covered by the insurer. The
payment per loss and payment per payment variables are defined as

$$Y^{L} = left{ begin{matrix}
0 & X leq d, \
X & X > d, \
end{matrix} right. $$

and

$$Y^{P} = left{ begin{matrix}
text{Undefined} & X leq d, \
X & X > d, \
end{matrix} right. $$

respectively.

Example 3.15 (SOA)

A claim severity distribution is exponential with mean 1000. An
insurance company will pay the amount of each claim in excess of a
deductible of 100. Calculate the variance of the amount paid by the
insurance company for one claim, including the possibility that the
amount paid is 0.

Solution

Let $Y^{L}$ denote the amount paid by the insurance company for one
claim.

$$Y^{L} = left( X – 100 right)_{+} = left{ begin{matrix}
0 & X leq 100, \
X – 100 & X > 100. \
end{matrix} right. $$

The first and second moments of $Y^{L}$ are

$Eleft( Y^{L} right) = int_{100}^{infty}left( x – 100 right)f_{X}left( x right)dx = {int_{100}^{infty}{S_{X}left( x right)}dx = 1000e}^{- frac{100}{1000}}$,
and

$Eleftlbrack left( Y^{L} right)^{2} rightrbrack = int_{100}^{infty}left( x – 100 right)^{2}f_{X}left( x right)dx = 2 times 1000^{2}e^{- frac{100}{1000}}$.

$Vleft( Y^{L} right) = left( 2 times 1000^{2}e^{- frac{100}{1000}} right) – left( {1000e}^{- frac{100}{1000}} right)^{2} = 990,944$.

The solution can be simplified if we make use of the relationship
between $X$ and $Y^{P}$. If $X$ is exponentially distributed with mean
1000, then $Y^{P}$ is also exponentially distributed with the same mean.
Hence, $Eleft( Y^{P} right)$=1000 and
$Eleftlbrack left( Y^{P} right)^{2} rightrbrack = 2 times 1000^{2}$.

Using the relationship between $Y^{L}$ and $Y^{P}$ we find

$$Eleft( Y^{L} right) = Eleft( Y^{P} right)S_{X}left( 100 right){= 1000e}^{- frac{100}{1000}}$$

$Eleftlbrack left( Y^{L} right)^{2} rightrbrack = Eleftlbrack left( Y^{P} right)^{2} rightrbrack S_{X}left( 100 right) = 2 times 1000^{2}e^{- frac{100}{1000}}$.

Example 3.16 (SOA)

For an insurance:

i. Losses have a density function
$f_{X}left( x right) = left{ begin{matrix}
0.02x & 0 < x < 10, \
0 & text{elsewhere.} \
end{matrix} right. $

ii. The insurance has an ordinary deductible of 4 per loss.

iii. $Y^{P}$ is the claim payment per payment random variable.

Calculate $Eleft( Y^{P} right)$.

Solution

$$Y^{P} = left{ begin{matrix}
text{Undefined} & X leq 4, \
X – 4 & X > 4. \
end{matrix} right. $$

$Eleft( Y^{P} right) = frac{int_{4}^{10}left( x – 4 right)0.02xdx}{{1 – F}_{X}left( 4 right)} = frac{2.88}{0.84} = 3.43$.

Example 3.17 (SOA)

You are given:

i. Losses follow an exponential distribution with the same mean in all
years.

ii. The loss elimination ratio this year is 70%.

iii. The ordinary deductible for the coming year is 4/3 of the current
deductible.

Compute the loss elimination ratio for the coming year.

Solution

The LER for the current year
$= frac{Eleft( X right) – Eleft( Y^{L} right)}{Eleft( X right)} = frac{theta – theta e^{- frac{d}{theta}}}{theta} = 1 – e^{- frac{d}{theta}} = 0.7$.

Then, $e^{- frac{d}{theta}} = 0.3$.

The LER for the coming year
$= frac{theta – theta e^{- frac{left( frac{4}{3}d right)}{theta}}}{theta} = 1 – e^{- frac{left( frac{4}{3}d right)}{theta}} = 1 – left( e^{- frac{d}{theta}} right)^{frac{4}{3}} = 1 – {0.3}^{frac{4}{3}} = 0.8$.

3.4.2 Policy limits

Under a limited policy, the insurer is responsible for covering the
actual loss $X$ up to the limit of its coverage. This fixed limit of
coverage is called the policy limit and often denoted by $u$. If the
loss exceeds the policy limit, the difference $X – u$ has to be paid by
the policyholder. While a higher policy limit means a higher payout to
the insured, it is associated with a higher premium.

Let $X$ denote the loss incurred to the insured and $Y$ denote the
amount of paid claim by the insurer. Then $Y$ is defined as

$$Y = X land u = left{ begin{matrix}
X & X leq u, \
u & X > u. \
end{matrix} right. $$

It can be seen that the distinction between $Y^{L}$ and $Y^{P}$ is not
needed under limited policy as the insurer will always make a payment.

Even when the distribution of $X$ is continuous, the distribution of $Y$
is partly discrete and partly continuous. The discrete part of the
distribution is concentrated at $Y = u$ (when $X > u$), while the
continuous part is spread over the interval $Y < u$ (when $X leq u$).
For the discrete part, the probability that the benefit paid is $u$, is
the probability that the loss exceeds the policy limit $u$; that is,
$Pleft( Y = u right) = Pleft( X > u right) = {1 – F}_{X}left( u right)$.
For the continuous part of the distribution $Y = X$, hence the
probability density function of $Y$ is given by

$$f_{Y}left( y right) = left{ begin{matrix}
f_{X}left( y right) & 0 < y < u, \
1 – F_{X}left( u right) & y = u. \
end{matrix} right. $$

Accordingly, the distribution function of $Y$ is given by

$$F_{Y}left( y right) = left{ begin{matrix}
F_{X}left( x right) & 0 < y < u, \
1 & y geq u. \
end{matrix} right. $$

The raw moments of $Y$ can be found directly using the probability
density function of $X$ as follows

$$Eleft( Y^{k} right) = Eleftlbrack left( X land u right)^{k} rightrbrack = int_{0}^{u}x^{k}f_{X}left( x right)dx + int_{u}^{infty}{u^{k}f_{X}left( x right)}text{dx}$$

$ = int_{0}^{u}x^{k}f_{X}left( x right)dx + u^{k}leftlbrack {1 – F}_{X}left( u right) rightrbracktext{dx}$.

Example 3.18 (SOA)

Under a group insurance policy, an insurer agrees to pay 100% of the
medical bills incurred during the year by employees of a small company,
up to a maximum total of one million dollars. The total amount of bills
incurred, $X$, has probability density function

$$f_{X}left( x right) = left{ begin{matrix}
frac{xleft( 4 – x right)}{9} & 0 < x < 3, \
0 & text{elsewhere.} \
end{matrix} right. $$

where $x$ is measured in millions. Calculate the total amount, in
millions of dollars, the insurer would expect to pay under this policy.

Solution

$$Y = X land 1 = left{ begin{matrix}
X & X leq 1, \
1 & X > 1. \
end{matrix} right. $$

$Eleft( Y right) = Eleft( X land 1 right) = int_{0}^{1}frac{x^{2}(4 – x)}{9}dx + int_{1}^{3}frac{xleft( 4 – x right)}{9}dx = 0.935$.

3.4.3 Coinsurance

As we have seen in Section 3.4.1, the amount of loss retained by the
policyholder can be losses up to the deductible $d$. The retained loss
can also be a percentage of the claim. The percentage $alpha$, often
referred to as the coinsurance factor, is the percentage of claim the
insurance company is required to cover. If the policy is subject to an
ordinary deductible and policy limit, coinsurance refers to the
percentage of claim the insurer is required to cover, after imposing the
ordinary deductible and policy limit. The payment per loss variable,
$Y^{L}$, is defined as

$$Y^{L} = left{ begin{matrix}
0 & X leq d, \
alphaleft( X – d right) & d < X leq u, \
alphaleft( u – d right) & X > u. \
end{matrix} right. $$

The policy limit (the maximum amount paid by the insurer) in this case
is $alphaleft( u – d right)$, while $u$ is the maximum covered loss.

The $k$-th moment of $Y^{L}$ is given by

$Eleftlbrack left( Y^{L} right)^{k} rightrbrack = int_{d}^{u}leftlbrack alphaleft( x – d right) rightrbrack^{k}f_{X}left( x right)dx + int_{u}^{infty}leftlbrack alphaleft( u – d right) rightrbrack^{k}f_{X}left( x right)text{dx}$.

A growth factor $left( 1 + r right)$ may be applied to $X$ resulting
in an inflated loss random variable $left( 1 + r right)X$ (the
prespecified d and u remain unchanged). The resulting per loss
variable can be written as

$$Y^{L} = left{ begin{matrix}
0 & X leq frac{d}{1 + r}, \
alphaleftlbrack left( 1 + r right)X – d rightrbrack & frac{d}{1 + r} < X leq frac{u}{1 + r}, \
alphaleft( u – d right) & X > frac{u}{1 + r}. \
end{matrix} right. $$

The first and second moments of $Y^{L}$ can be expressed as

$Eleft( Y^{L} right) = alphaleft( 1 + r right)leftlbrack Eleft( X land frac{u}{1 + r} right) – Eleft( X land frac{d}{1 + r} right) rightrbrack$,

and

$Eleftlbrack left( Y^{L} right)^{2} rightrbrack = alpha^{2}left( 1 + r right)^{2}left{ Eleftlbrack left( X land frac{u}{1 + r} right)^{2} rightrbrack – Eleftlbrack left( X land frac{d}{1 + r} right)^{2} rightrbrack – 2left( frac{d}{1 + r} right)leftlbrack Eleft( X land frac{u}{1 + r} right) – Eleft( X land frac{d}{1 + r} right) rightrbrack right}$,

respectively.

The formulae given for the first and second moments of $Y^{L}$ are
general. Under full coverage, $alpha = 1$, $r = 0$, $u = infty$,
$d = 0$ and $Eleft( Y^{L} right)$ reduces to $Eleft( X right)$. If
only an ordinary deductible is imposed, $alpha = 1$, $r = 0$,
$u = infty$ and $Eleft( Y^{L} right)$ reduces to
$Eleft( X right) – Eleft( X land d right)$. If only a policy limit
is imposed $alpha = 1$, $r = 0$, $d = 0$ and $Eleft( Y^{L} right)$
reduces to $Eleft( X land u right)$.

Example 3.19 (SOA)

The ground up loss random variable for a health insurance policy in 2006
is modeled with X, an exponential distribution with mean 1000. An
insurance policy pays the loss above an ordinary deductible of 100, with
a maximum annual payment of 500. The ground up loss random variable is
expected to be 5% larger in 2007, but the insurance in 2007 has the same
deductible and maximum payment as in 2006. Find the percentage increase
in the expected cost per payment from 2006 to 2007.

Solution

$$Y_{2006}^{L} = left{ begin{matrix}
0 & X leq 100, \
X – 100 & 100 < X leq 600, \
500 & X > 600. \
end{matrix} right. $$

$$Y_{2007}^{L} = left{ begin{matrix}
0 & X leq 95.24, \
1.05X – 100 & 95.24 < X leq 571.43, \
500 & X > 571.43. \
end{matrix} right. $$

$$Eleft( Y_{2006}^{L} right) = Eleft( X land 600 right) – Eleft( X land 100 right) = 1000left( {1 – e}^{- frac{600}{1000}} right) – 1000left( {1 – e}^{- frac{100}{1000}} right)$$

$= 356.026$.

$$Eleft( Y_{2007}^{L} right) = 1.05leftlbrack Eleft( X land 571.43 right) – Eleft( X land 95.24 right) rightrbrack$$

$= 1.05leftlbrack 1000left( {1 – e}^{- frac{571.43}{1000}} right) – 1000left( {1 – e}^{- frac{95.24}{1000}} right) rightrbrack$

$mathbf{=}361.659$.

$Eleft( Y_{2006}^{P} right) = frac{356.026}{e^{- frac{100}{1000}} = 393.469}$.

$Eleft( Y_{2007}^{P} right) = frac{361.659}{e^{- frac{95.24}{1000}} = 397.797}$.

There is an increase of 1.1% from 2006 to 2007.

3.4.4 Reinsurance

In Section 3.4.1 we introduced the policy deductible, which is a
contractual arrangement under which an insured transfers part of the
risk by securing coverage from an insurer in return for an insurance
premium. Under that policy, when the loss exceeds the deductible, the
insurer is not required to pay until the insured has paid the fixed
deductible. We now introduce reinsurance, a mechanism of insurance for
insurance companies. Reinsurance is a contractual arrangement under
which an insurer transfers part of the underlying insured risk by
securing coverage from another insurer (referred to as a reinsurer) in
return for a reinsurance premium. Although reinsurance involves a
relationship between three parties: the original insured, the insurer
(often referred to as cedent or cedant) and the reinsurer, the parties
of the reinsurance agreement are only the primary insurer and the
reinsurer. There is no contractual agreement between the original
insured and the reinsurer. The reinsurer is not required to pay under
the reinsurance contract until the insurer has paid a loss to its
original insured. The amount retained by the primary insurer in the
reinsurance agreement (the reinsurance deductible) is called retention.

Reinsurance arrangements allow insurers with limited financial resources
to increase the capacity to write insurance and meet client requests for
larger insurance coverage while reducing the impact of potential losses
and protecting the insurance company against catastrophic losses.
Reinsurance also allows the primary insurer to benefit from underwriting
skills, expertize and proficient complex claim file handling of the
larger reinsurance companies.

Example 3.20 (SOA)

In 2005 a risk has a two-parameter Pareto distribution with $alpha = 2$
and $theta = 3000$. In 2006 losses inflate by 20%. Insurance on the
risk has a deductible of 600 in each year. $P_{i}$, the premium in year
$i$, equals 1.2 times expected claims. The risk is reinsured with a
deductible that stays the same in each year. $R_{i}$, the reinsurance
premium in year $i$, equals 1.1 times the expected reinsured claims.
$frac{R_{2005}}{P_{2005} = 0.55}$. Calculate
$frac{R_{2006}}{P_{2006}}$.

Solution

$X_{i}:$ The risk in year $i$

$Y_{i}:$ The insured claim in year $i$

$P_{i}:$ The insurance premium in year $i$

$Y_{i}^{R}:$ The reinsured claim in year $i$

$R_{i}:$ The reinsurance premium in year $i$

$d:$ The insurance deductible in year $i$ (the insurance deductible is
fixed each year, equal to 600)

$d^{R}:$ The reinsurance deductible or retention in year $i$ (the
reinsurance deductible is fixed each year, but unknown)

where $i = 2005, 2006$

$Y_{i} = left{ begin{matrix}
0 & X_{i} leq 600 \
X_{i} – 600 & X_{i} > 600 \
end{matrix} right. $

where $i = 2005, 2006$

$$X_{2005}sim Paleft( 2,3000 right)$$

$$Eleft( Y_{2005} right) = Eleft( X_{2005} – 600 right){+} = Eleft( X{2005} right) – Eleft( X_{2005} land 600 right)$$

$= 3000 – 3000left( 1 – frac{3000}{3600} right) = 2500$

$$P_{2005} = 1.2Eleft( Y_{2005} right) = 3000$$

Since $X_{2006} = 1.2X_{2005}$ and Pareto is a scale distribution with
scale parameter $theta$, then $X_{2006}sim Paleft( 2,3600 right)$

$$Eleft( Y_{2006} right) = Eleft( X_{2006} – 600 right){+} = Eleft( X{2006} right) – Eleft( X_{2006} land 600 right)$$

$= 3600 – 3600left( 1 – frac{3600}{4200} right) = 3085.714$

$$P_{2006} = 1.2Eleft( Y_{2006} right) = 3702.857$$

$$Y_{i}^{R} = left{ begin{matrix}
0 & X_{i} – 600 leq d^{R} \
X_{i} – 600 – d^{R} & X_{i} – 600 > d^{R} \
end{matrix} right. $$

Since $frac{R_{2005}}{P_{2005}} = 0.55$, then
$R_{2005} = 3000 times 0.55 = 1650$

Since $R_{2005} = 1.1Eleft( Y_{2005}^{R} right)$, then
$Eleft( Y_{2005}^{R} right) = frac{1650}{1.1} = 1500$

$$Eleft( Y_{2005}^{R} right) = Eleft( X_{2005} – 600 – d^{R} right){+} = Eleft( X{2005} right) – Eleft( X_{2005} land left( 600 + d^{R} right) right)$$

$= 3000 – 3000left( 1 – frac{3000}{3600 + d^{R}} right) = 1500 Rightarrow d^{R} = 2400$

$$Eleft( Y_{2006}^{R} right) = Eleft( X_{2006} – 600 – d^{R} right){+} = Eleft( X{2006} – 3000 right){+} = Eleft( X{2006} right) – Eleft( X_{2006} land 3000 right)$$

$= 3600 – 3600left( 1 – frac{3600}{6600} right) = 1963.636$

$$R_{2006} = 1.1Eleft( Y_{2006}^{R} right) = 1.1 times 1963.636 = 2160$$

Therefore $frac{R_{2006}}{P_{2006}} = frac{2160}{3702.857} = 0.583$

3.5 Maximum likelihood estimation

In this section we estimate statistical parameters using the method of
maximum likelihood. Maximum likelihood estimates in the presence of
grouping, truncation or censoring are calculated.

3.5.1 Maximum likelihood estimators for complete data

Pricing of insurance premiums and estimation of claim reserving are
among many actuarial problems that involve modeling the severity of loss
(claim size). The principles for using maximum likelihood to estimate
model parameters were introduced in Chapter 2. In
this section, we present a few examples to illustrate how actuaries fit
a parametric distribution model to a set of claim data using maximum
likelihood. In these examples we derive the asymptotic
variance of maximum-likelihood estimators of the model parameters. We
use the delta method to derive the asymptotic variances of functions of
these parameters.

Example 3.21

You are given the following:

A random sample of claim amounts: 8,000 10,000 12,000 15,000.

Claim amounts follow an inverse exponential distribution, with parameter
$theta$.

i. Calculate the maximum likelihood estimator for $theta$.

ii. Approximate the variance of the maximum likelihood estimator.

iii. Determine an approximate 95% confidence interval for $theta$.

iv. Determine an approximate 95% confidence interval for
$Pleft( X leq 9,000 right).$

Solution

$f_{X}left( x right) = frac{theta e^{- frac{theta}{x}}}{x^{2}}$,
$x > 0$.

The likelihood function, $Lleft( theta right)$, can be viewed as the
probability of the observed data, written as a function of the model’s
parameter $theta$

$Lleft( theta right) = prod_{i = 1}^{4}{f_{X_{i}}left( x_{i} right)} = frac{theta^{4}e^{- thetasum_{i = 1}^{4}frac{1}{x_{i}}}}{prod_{i = 1}^{4}x_{i}^{2}}$.

The loglikelihood function, $text{lnL}left( theta right)$, is the
sum of the individual logarithms.

$text{lnL}left( theta right) = 4lntheta – thetasum_{i = 1}^{4}frac{1}{x_{i}} – 2sum_{i = 1}^{4}ln x_{i}$.

$frac{text{dlnL}left( theta right)}{text{dθ}} = frac{4}{theta} – sum_{i = 1}^{4}frac{1}{x_{i}}$.

The maximum likelihood estimator of $theta$, denoted by $hat{theta}$,
is the solution to the equation

$frac{4}{hat{theta}} – sum_{i = 1}^{4}{frac{1}{x_{i}} = 0}$. Thus,
$hat{theta} = frac{4}{sum_{i = 1}^{4}frac{1}{x_{i}}} = 10,667$

The second derivative of $text{lnL}left( theta right)$ is given by

$frac{d^{2}text{lnL}left( theta right)}{dtheta^{2}} = frac{- 4}{theta^{2}}$.

Evaluating the second derivative of the loglikelihood function at
$hat{theta} = 10,667$ gives a negative value, indicating
$hat{theta}$ as the value that maximizes the loglikelihood function.

Taking reciprocal of negative expectation of the second derivative of
$text{lnL}left( theta right)$, we obtain an estimate of the variance
of $hat{theta}$

$hat{V}left( hat{theta} right) = left. leftlbrack Eleft( frac{d^{2}text{lnL}left( theta right)}{dtheta^{2}} right) rightrbrack^{- 1} right|_{theta = hat{theta}} = frac{{hat{theta}}^{2}}{4} = 28,446,222$.

It should be noted that as the sample size $n rightarrow infty$, the
distribution of the maximum likelihood estimator $hat{theta}$
converges to a normal distribution with mean $theta$ and variance
$hat{V}left( hat{theta} right)$. The approximate confidence
interval in this example is based on the assumption of normality,
despite the small sample size, only for the purpose of illustration.

The 95% confidence interval for $theta$ is given by

$10,667 pm 1.96sqrt{28,446,222} = left( 213.34, 21,120.66 right)$.

The distribution function of X is
$Fleft( x right) = 1 – e^{- frac{x}{theta}}$. Then, the maximum
likelihood estimate of $gleft( theta right) = Fleft( 9,000 right)$
is
$gleft( hat{theta} right) = 1 – e^{- frac{9,000}{10,667}} = 0.57$.

We use the delta method to approximate the variance of
$gleft( hat{theta} right)$.

$frac{text{dg}left( theta right)}{text{dθ}} = {- frac{9,000}{theta^{2}}e}^{- frac{9,000}{theta}}$.

$hat{V}leftlbrack gleft( hat{theta} right) rightrbrack = left( – {frac{9,000}{{hat{theta}}^{2}}e}^{- frac{9,000}{hat{theta}}} right)^{2}hat{V}left( hat{theta} right) = 0.0329$.

The 95% confidence interval for $Fleft( 9,000 right)$ is given by

$0.57 pm 1.96sqrt{0.0329} = left( 0.214, 0.926 right)$.

Example 3.22

A random sample of size 6 is from a lognormal distribution with
parameters $mu$ and $sigma$. The sample values are 200, 3,000, 8,000,
60,000, 60,000, 160,000.

i. Calculate the maximum likelihood estimator for $mu$ and $sigma$.

ii. Estimate the covariance matrix of the maximum likelihood estimator.

iii. Determine approximate 95% confidence intervals for $mu$ and
$sigma$.

iv. Determine an approximate 95% confidence interval for the mean of the
lognormal distribution.

Solution

$f_{X}left( x right) = frac{1}{text{xσ}sqrt{2pi}}exp – frac{1}{2}left( frac{lnx – mu}{sigma} right)^{2}$,
$x > 0$.

The likelihood function, $Lleft( mu,sigma right)$, is the product of
the pdf for each data point.

$Lleft( mu,sigma right) = prod_{i = 1}^{6}{f_{X_{i}}left( x_{i} right)} = frac{1}{sigma^{6}left( 2pi right)^{3}prod_{i = 1}^{6}x_{i}}exp – frac{1}{2}sum_{i = 1}^{6}left( frac{ln x_{i} – mu}{sigma} right)^{2}$.

The loglikelihood function, $text{lnL}left( mu,sigma right)$, is
the sum of the individual logarithms.

$text{lnL}left( mu,sigma right) = – 6lnsigma – 3lnleft( 2pi right) – sum_{i = 1}^{6}ln x_{i} – frac{1}{2}sum_{i = 1}^{6}left( frac{ln x_{i} – mu}{sigma} right)^{2}$.

The first partial derivatives are

$frac{partial lnLleft( mu,sigma right)}{partialmu} = frac{1}{sigma^{2}}sum_{i = 1}^{6}left( ln x_{i} – mu right)$.

$frac{partial lnLleft( mu,sigma right)}{partialsigma} = frac{- 6}{sigma} + frac{1}{sigma^{3}}sum_{i = 1}^{6}left( ln x_{i} – mu right)^{2}$.

The maximum likelihood estimators of $mu$ and $sigma$, denoted by
$hat{mu}$ and $hat{sigma}$, are the solutions to the equations

$frac{1}{{hat{sigma}}^{2}}sum_{i = 1}^{6}left( lnx_{i} – hat{mu} right) = 0$.

$frac{- 6}{hat{sigma}} + frac{1}{{hat{sigma}}^{3}}sum_{i = 1}^{6}left( ln x_{i} – hat{mu} right)^{2} = 0$.

These yield the estimates

$hat{mu} = frac{sum_{i = 1}^{6}{ln x_{i}}}{6} = 9.38$ and
${hat{sigma}}^{2} = frac{sum_{i = 1}^{6}left( ln x_{i} – hat{mu} right)^{2}}{6} = 5.12$.

The second partial derivatives are

$frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmu^{2}} = frac{- 6}{sigma^{2}}$,
$frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmupartialsigma} = frac{- 2}{sigma^{3}}sum_{i = 1}^{6}left( ln x_{i} – mu right)$
and
$frac{partial^{2}text{lnL}left( mu,sigma right)}{partialsigma^{2}} = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}left( ln x_{i} – mu right)^{2}$.

To derive the covariance matrix of the mle we need to find the
expectations of the second derivatives. Since the random variable $X$ is
from a lognormal distribution with parameters $mu$ and $sigma$, then
$text{lnX}$ is normally distributed with mean $mu$ and variance
$sigma^{2}$.

$Eleft( frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmu^{2}} right) = Eleft( frac{- 6}{sigma^{2}} right) = frac{- 6}{sigma^{2}}$,

$Eleft( frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmupartialsigma} right) = frac{- 2}{sigma^{3}}sum_{i = 1}^{6}{Eleft( ln x_{i} – mu right)} = frac{- 2}{sigma^{3}}sum_{i = 1}^{6}leftlbrack Eleft( ln x_{i} right) – mu rightrbrack$=$frac{- 2}{sigma^{3}}sum_{i = 1}^{6}left( mu – mu right) = 0$,

and

$Eleft( frac{partial^{2}text{lnL}left( mu,sigma right)}{partialsigma^{2}} right) = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}{Eleft( ln x_{i} – mu right)}^{2} = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}{Vleft( ln x_{i} right) = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}{sigma^{2} = frac{- 12}{sigma^{2}}}}$.

Using the negatives of these expectations we obtain the Fisher
information matrix $begin{bmatrix}
frac{6}{sigma^{2}} & 0 \
0 & frac{12}{sigma^{2}} \
end{bmatrix}$.

The covariance matrix, $Sigma$, is the inverse of the Fisher
information matrix $Sigma = begin{bmatrix}
frac{sigma^{2}}{6} & 0 \
0 & frac{sigma^{2}}{12} \
end{bmatrix}$.

The estimated matrix is given by $hat{Sigma} = begin{bmatrix}
0.8533 & 0 \
0 & 0.4267 \
end{bmatrix}$.

The 95% confidence interval for $mu$ is given by
$9.38 pm 1.96sqrt{0.8533} = left( 7.57, 11.19 right)$.

The 95% confidence interval for $sigma^{2}$ is given by
$5.12 pm 1.96sqrt{0.4267} = left( 3.84, 6.40 right)$.

The mean of X is $expleft( mu + frac{sigma^{2}}{2} right)$.
Then, the maximum likelihood estimate of
$gleft( mu,sigma right) = expleft( mu + frac{sigma^{2}}{2} right)$
is
$gleft( hat{mu},hat{sigma} right) = expleft( hat{mu} + frac{{hat{sigma}}^{2}}{2} right) = 153,277$.

We use the delta method to approximate the variance of the mle
$gleft( hat{mu},hat{sigma} right)$.

$frac{partial gleft( mu,sigma right)}{partialmu} = expleft( mu + frac{sigma^{2}}{2} right)$
and
$frac{partial gleft( mu,sigma right)}{partialsigma} = sigma expleft( mu + frac{sigma^{2}}{2} right)$.

Using the delta method, the approximate variance of
$gleft( hat{mu},hat{sigma} right)$ is given by

$$left. hat{V}left( gleft( hat{mu},hat{sigma} right) right) = begin{bmatrix}
frac{partial gleft( mu,sigma right)}{partialmu} & frac{partial gleft( mu,sigma right)}{partialsigma} \
end{bmatrix}Sigmabegin{bmatrix}
frac{partial gleft( mu,sigma right)}{partialmu} \
frac{partial gleft( mu,sigma right)}{partialsigma} \
end{bmatrix} right|_{mu = hat{mu},sigma = hat{sigma}}$$

$= begin{bmatrix}
153,277 & 346,826 \
end{bmatrix}begin{bmatrix}
0.8533 & 0 \
0 & 0.4267 \
end{bmatrix}begin{bmatrix}
153,277 \
346,826 \
end{bmatrix} =$71,374,380,000

The 95% confidence interval for
$expleft( mu + frac{sigma^{2}}{2} right)$ is given by

$153,277 pm 1.96sqrt{71,374,380,000} = left( – 370,356, 676,910 right)$.

Since the mean of the lognormal distribution cannot be negative, we
should replace the negative lower limit in the previous interval by a
zero.

3.5.2 Maximum likelihood estimators for grouped data

In the previous section we considered the maximum likelihood estimation
of continuous models from complete (individual) data. Each individual
observation is recorded, and its contribution to the likelihood function
is the density at that value. In this section we consider the problem of
obtaining maximum likelihood estimates of parameters from grouped data.
The observations are only available in grouped form, and the
contribution of each observation to the likelihood function is the
probability of falling in a specific group (interval). Let $n_{j}$
represent the number of observations in the interval
$left( left. c_{j – 1},c_{j} rightrbrack right. text{. }$The
grouped data likelihood function is thus given by

$Lleft( theta right) = prod_{j = 1}^{k}leftlbrack Fleft( left. c_{j} right|theta right) – Fleft( left. c_{j – 1} right|theta right) rightrbrack^{n_{j}}$,

where $c_{0}$ is the smallest possible observation (often set to zero)
and $c_{k}$ is the largest possible observation (often set to infinity).

Example 3.23 (SOA)

For a group of policies, you are given:

i. Losses follow the distribution function
$Fleft( x right) = 1 – frac{theta}{x}$, $theta < x < infty$.

ii. A sample of 20 losses resulted in the following:

Interval Number of Losses

$$x leq 10$$ 9
$$10 < x leq 25$$ 6
$$x > 25$$ 5

Calculate the maximum likelihood estimate of $theta$.

Solution

The contribution of each of the 9 observations in the first interval to
the likelihood function is the probability of $X leq 10$; that is,
$Pleft( X leq 10 right) = Fleft( 10 right)$. Similarly, the
contributions of each of 6 and 5 observations in the second and third
intervals are
$Pleft( 10 < X leq 25 right) = Fleft( 25 right) – F(10)$ and
$Pleft( X > 25 right) = 1 – F(25)$, respectively. The likelihood
function is thus given by

$$Lleft( theta right) = leftlbrack Fleft( 10 right) rightrbrack^{9}leftlbrack Fleft( 25 right) – F(10) rightrbrack^{6}leftlbrack 1 – F(25) rightrbrack^{5}$$

${= left( 1 – frac{theta}{10} right)}^{9}left( frac{theta}{10} – frac{theta}{25} right)^{6}left( frac{theta}{25} right)^{5}$

${= left( frac{10 – theta}{10} right)}^{9}left( frac{15theta}{250} right)^{6}left( frac{theta}{25} right)^{5}$.

Then,
$text{lnL}left( theta right) = 9lnleft( 10 – theta right) + 6lntheta + 5lntheta – 9ln10 + 6ln15 – 6ln250 – 5ln25$.

$frac{text{dlnL}left( theta right)}{text{dθ}} = frac{- 9}{left( 10 – theta right)} + frac{6}{theta} + frac{5}{theta}$.

The maximum likelihood estimator, $hat{theta}$, is the solution to the
equation
$frac{- 9}{left( 10 – hat{theta} right)} + frac{11}{hat{theta}} = 0$;
and $hat{theta} = 5.5$.

3.5.3 Maximum likelihood estimators for censored data

Another distinguishing feature of data gathering mechanism is censoring.
While for some event of interest (losses, claims, lifetimes, etc.) the
complete data maybe available, for others only partial information is
available; information that the observation exceeds a specific value.
The limited policy introduced in Section 3.4.2 is an example of right
censoring. Any loss greater than or equal to the policy limit is
recorded at the limit. The contribution of the censored observation to
the likelihood function is the probability of the random variable
exceeding this specific limit. Note that contributions of both complete
and censored data share the survivor function, for a complete point this
survivor function is multiplied by the hazard function, but for a
censored observation it is not.

**Example 3.24 (SOA) **

The random variable has survival function:
$S_{X}left( x right) = frac{theta^{4}}{left( theta^{2} + x^{2} right)^{2}}$.

Two values of $X$ are observed to be 2 and 4. One other value exceeds 4.

Calculate the maximum likelihood estimate of $theta$.

Solution

The contributions of the two observations 2 and 4 are
$f_{X}left( 2 right)$ and $f_{X}left( 4 right)$ respectively. The
contribution of the third observation, which is only known to exceed 4
is $S_{X}left( 4 right)$. The likelihood function is thus given by

$Lleft( theta right) = f_{X}left( 2 right)f_{X}left( 4 right)S_{X}left( 4 right)$.

The probability density function of $X$ is given by
$f_{X}left( x right) = frac{4xtheta^{4}}{left( theta^{2} + x^{2} right)^{3}}$.
Thus,

$Lleft( theta right) = frac{8theta^{4}}{left( theta^{2} + 4 right)^{3}}frac{16theta^{4}}{left( theta^{2} + 16 right)^{3}}frac{theta^{4}}{left( theta^{2} + 16 right)^{2}} = frac{128theta^{12}}{left( theta^{2} + 4 right)^{3}left( theta^{2} + 16 right)^{5}}$,

$text{lnL}left( theta right) = ln128 + 12lntheta – 3lnleft( theta^{2} + 4 right) – 5lnleft( theta^{2} + 16 right)$,

and

$frac{text{dlnL}left( theta right)}{text{dθ}} = frac{12}{theta} – frac{6theta}{left( theta^{2} + 4 right)} – frac{10theta}{left( theta^{2} + 16 right)}$.

The maximum likelihood estimator, $hat{theta}$, is the solution to the
equation
$frac{12}{hat{theta}} – frac{6hat{theta}}{left( {hat{theta}}^{2} + 4 right)} – frac{10hat{theta}}{left( {hat{theta}}^{2} + 16 right)} = 0$
or
$12left( {hat{theta}}^{2} + 4 right)left( {hat{theta}}^{2} + 16 right) – 6{hat{theta}}^{2}left( {hat{theta}}^{2} + 16 right) – 10{hat{theta}}^{2}left( {hat{theta}}^{2} + 4 right) = – 4{hat{theta}}^{4} + 104{hat{theta}}^{2} + 768 = 0$,
which yields ${hat{theta}}^{2} = 32$ and $hat{theta} = 5.7$.

3.5.4 Maximum likelihood estimators for truncated data

This section is concerned with the maximum likelihood estimation of the
continuous distribution of the random variable $X$ when the data is
incomplete due to truncation. If the values of $X$ are truncated at $d$,
then it should be noted that we would not have been aware of the
existence of these values had they not exceeded $d$. The policy
deductible introduced in Section 3.4.1 is an example of left truncation.
Any loss less than or equal to the deductible is not recorded. The
contribution to the likelihood function of an observation $x$ truncated
at $d$ will be a conditional probability and the $f_{X}left( x right)$
will be replaced by
$frac{f_{X}left( x right)}{S_{X}left( d right)}$.

Example 3.25 (SOA)

For the single parameter Pareto distribution with $theta = 2$, maximum
likelihood estimation is applied to estimate the parameter $alpha$.
Find the estimated mean of the ground up loss distribution based on the
maximum likelihood estimate of $alpha$ for the following data set:

Ordinary policy deductible of 5, maximum covered loss of 25 (policy
limit 20).

8 insurance payment amounts: 2, 4, 5, 5, 8, 10, 12, 15

2 limit payments: 20, 20.

Solution

The contributions of the different observations can be summarized as
follows:

For the exact loss: $f_{X}left( x right)$

For censored observations: $S_{X}left( 25 right)$.

For truncated observations:
$frac{f_{X}left( x right)}{S_{X}left( 5 right)}$.

Given that ground up losses smaller than 5 are omitted from the data
set, the contribution of all observations should be conditional on
exceeding 5. The likelihood function becomes

$Lleft( alpha right) = frac{prod_{i = 1}^{8}{f_{X}left( x_{i} right)}}{leftlbrack S_{X}left( 5 right) rightrbrack^{8}}leftlbrack frac{S_{X}left( 25 right)}{S_{X}left( 5 right)} rightrbrack^{2}$.

For the single parameter Pareto the probability density and distribution
functions are given by

$f_{X}left( x right) = frac{alphatheta^{alpha}}{x^{alpha + 1}}$
and
$F_{X}left( x right) = 1 – left( frac{theta}{x} right)^{alpha}$
for $x > theta$, respectively.

Then, the likelihood and loglikelihood functions are given by

$Lleft( alpha right) = frac{alpha^{8}}{prod_{i = 1}^{8}x_{i}^{alpha + 1}}frac{5^{10alpha}}{25^{2alpha}}$,

$text{lnL}left( alpha right) = 8lnalpha – left( alpha + 1 right)sum_{i = 1}^{8}{ln x_{i}} + 10alpha ln5 – 2alpha ln25$.

$frac{text{dlnL}left( alpha right)}{text{dα}} = frac{8}{alpha} – sum_{i = 1}^{8}{ln x_{i}} + 10ln5 – 2ln25$.

The maximum likelihood estimator, $hat{alpha}$, is the solution to the
equation

$frac{8}{hat{alpha}} – sum_{i = 1}^{8}{ln x_{i}} + 10ln5 – 2ln25 = 0$,

which yields
$hat{alpha} = frac{8}{sum_{i = 1}^{8}{ln x_{i}} – 10ln5 + 2ln25} = frac{8}{(ln7 + ln9 + ldots + ln20) – 10ln5 + 2ln25} = 0.785$.

The mean of the Pareto only exists for $alpha > 1$. Since
$hat{alpha} = 0.785 < 1$. Then, the mean does not exist.

3.5.5 Concluding remarks

In describing losses, actuaries fit appropriate parametric distribution
models for the frequency and severity of loss. This involves finding
appropriate statistical distributions that could efficiently model the
data in hand. After fitting a distribution model to a data set, the
model should be validated. Model validation is a crucial step in the
model building sequence. It assesses how well these statistical
distributions fit the data in hand and how well can we expect this model
to perform in the future. If the selected model does not fit the data,
another distribution is to be chosen. If more than one model seems to be
a good fit for the data, we then have to make the choice on which model
to use. It should be noted though that the same data should not serve
for both purposes (fitting and validating the model). Additional data
should be used to assess the performance of the model. There are many
statistical tools for model validation. Alternative goodness of fit
tests used to determine whether sample data are consistent with the
candidate model, will be presented in a separate chapter.

Further readings and references

Cummins, J. D. and Derrig, R. A. 1991. Managing the Insolvency Risk of
Insurance Companies, Springer Science+ Business Media, LLC.

Frees, E. W. and Valdez, E. A. 2008. Hierarchical Insurance Claims
Modeling, Journal of the American Statistical Association, 103,
1457-1469.

Klugman, S. A., Panjer, H. H. and Willmot, G. E. 2008. Loss Models from
Data to Decisions, Wiley.

Kreer, M., Kızılersü, A., Thomas, A. W. and Egídio dos Reis, A. D. 2015.
Goodness-of-fit tests and applications for left-truncated Weibull
distributions to non-life insurance, European Actuarial Journal, 5,
139–163.

McDonald, J. B. 1984. Some Generalized Functions for the Size
Distribution of Income, Econometrica, 52, 647–663.

McDonald, J. B. and Xu, Y. J. 1995. A Generalization of the Beta
Distribution with Applications, Journal of Econometrics, 66, 133–52.

Tevet, D. 2016. Applying Generalized Linear Models to Insurance Daata:
Frequency/Severity versus Premium Modeling in: Frees, E. W., Derrig, A.
R. and Meyers G. (Eds.) Predictive Modeling Applications in Actuarial
Science Vol. II Case Studies in Insurance. Cambridge University Press.

Venter, G. 1983. Transformed Beta and Gamma Distributions and Aggregate
Losses. Proceedings of the Casualty Actuarial Society, 70: 156–193.