In this section we estimate statistical parameters using the method of maximum likelihood. Maximum likelihood estimates in the presence of grouping, truncation or censoring are calculated.

3.5.1 Maximum Likelihood Estimators for Complete Data

Pricing of insurance premiums and estimation of claim reserving are among many actuarial problems that involve modeling the severity of loss (claim size). The principles for using maximum likelihood to estimate model parameters were introduced in Chapter 2. In this section, we present a few examples to illustrate how actuaries fit a parametric distribution model to a set of claim data using maximum likelihood. In these examples we derive the asymptotic variance of maximum-likelihood estimators of the model parameters. We use the delta method to derive the asymptotic variances of functions of these parameters.

Example 3.21 Consider a random sample of claim amounts: 8,000 10,000 12,000 15,000. You assume that claim amounts follow an inverse exponential distribution, with parameter $theta$.

Calculate the maximum likelihood estimator for $theta$.
Approximate the variance of the maximum likelihood estimator.
Determine an approximate 95% confidence interval for $theta$.
Determine an approximate 95% confidence interval for $Pr left( X leq 9,000 right).$

Solution

The probability density function is
$$f_{X}left( x right) = frac{theta e^{- frac{theta}{x}}}{x^{2}}, $$
where $x > 0$. The likelihood function, $Lleft( theta right)$, can be viewed as the probability of the observed data, written as a function of the model’s
parameter $theta$
$$Lleft( theta right) = prod_{i = 1}^{4}{f_{X_{i}}left( x_{i} right)} = frac{theta^{4}e^{- thetasum_{i = 1}^{4}frac{1}{x_{i}}}}{prod_{i = 1}^{4}x_{i}^{2}}.$$

The loglikelihood function, $ln L left( theta right)$, is the sum of the individual logarithms.
$$ln L left( theta right) = 4lntheta – thetasum_{i = 1}^{4}frac{1}{x_{i}} – 2sum_{i = 1}^{4}ln x_{i} .$$

$$frac{ln L left( theta right)}{text{dθ}} = frac{4}{theta} – sum_{i = 1}^{4}frac{1}{x_{i}}.$$
The maximum likelihood estimator of $theta$, denoted by $hat{theta}$, is the solution to the equation
$$frac{4}{hat{theta}} – sum_{i = 1}^{4}{frac{1}{x_{i}} = 0}.$$ Thus,
$hat{theta} = frac{4}{sum_{i = 1}^{4}frac{1}{x_{i}}} = 10,667$

The second derivative of $ln L left( theta right)$ is given by
$$frac{d^{2}ln Lleft( theta right)}{dtheta^{2}} = frac{- 4}{theta^{2}}.$$
Evaluating the second derivative of the loglikelihood function at $hat{theta} = 10,667$ gives a negative value, indicating $hat{theta}$ as the value that maximizes the loglikelihood function.

Taking reciprocal of negative expectation of the second derivative of $ln L left( theta right)$, we obtain an estimate of the variance of $hat{theta}$
$widehat{Var}left( hat{theta} right) = left. leftlbrack Eleft( frac{d^{2}ln L left( theta right)}{dtheta^{2}} right) rightrbrack^{- 1} right|_{theta = hat{theta}} = frac{{hat{theta}}^{2}}{4} = 28,446,222$.

It should be noted that as the sample size $n rightarrow infty$, the distribution of the maximum likelihood estimator $hat{theta}$ converges to a normal distribution with mean $theta$ and variance $hat{V}left( hat{theta} right)$. The approximate confidence interval in this example is based on the assumption of normality, despite the small sample size, only for the purpose of illustration.

The 95% confidence interval for $theta$ is given by
$$10,667 pm 1.96sqrt{28,446,222} = left( 213.34, 21,120.66 right).$$
The distribution function of $X$ is $Fleft( x right) = 1 – e^{- frac{x}{theta}}$. Then, the maximum likelihood estimate of $gleft( theta right) = Fleft( 9,000 right)$ is
$$gleft( hat{theta} right) = 1 – e^{- frac{9,000}{10,667}} = 0.57.$$
We use the delta method to approximate the variance of $gleft( hat{theta} right)$.
$$frac{text{dg}left( theta right)}{text{dθ}} = {- frac{9,000}{theta^{2}}e}^{- frac{9,000}{theta}}.$$

$widehat{Var}leftlbrack gleft( hat{theta} right) rightrbrack = left( – {frac{9,000}{{hat{theta}}^{2}}e}^{- frac{9,000}{hat{theta}}} right)^{2}hat{V}left( hat{theta} right) = 0.0329$.

The 95% confidence interval for $Fleft( 9,000 right)$ is given by
$$0.57 pm 1.96sqrt{0.0329} = left( 0.214, 0.926 right).$$

Example 3.22 A random sample of size 6 is from a lognormal distribution with parameters $mu$ and $sigma$. The sample values are 200, 3,000, 8,000, 60,000, 60,000, 160,000.

Calculate the maximum likelihood estimator for $mu$ and $sigma$.
Estimate the covariance matrix of the maximum likelihood estimator.
Determine approximate 95% confidence intervals for $mu$ and $sigma$.
Determine an approximate 95% confidence interval for the mean of the lognormal distribution.

Solution

The probability density function is
$$f_{X}left( x right) = frac{1}{text{xσ}sqrt{2pi}}exp – frac{1}{2}left( frac{lnx – mu}{sigma} right)^{2},$$
where $x > 0$. The likelihood function, $Lleft( mu,sigma right)$, is the product of the pdf for each data point.
$$Lleft( mu,sigma right) = prod_{i = 1}^{6}{f_{X_{i}}left( x_{i} right)} = frac{1}{sigma^{6}left( 2pi right)^{3}prod_{i = 1}^{6}x_{i}}exp – frac{1}{2}sum_{i = 1}^{6}left( frac{ln x_{i} – mu}{sigma} right)^{2}.$$
The loglikelihood function, $ln L left( mu,sigma right)$, is the sum of the individual logarithms.
$$ln left( mu,sigma right) = – 6lnsigma – 3lnleft( 2pi right) – sum_{i = 1}^{6}ln x_{i} – frac{1}{2}sum_{i = 1}^{6}left( frac{ln x_{i} – mu}{sigma} right)^{2}.$$
The first partial derivatives are
$$frac{partial lnLleft( mu,sigma right)}{partialmu} = frac{1}{sigma^{2}}sum_{i = 1}^{6}left( ln x_{i} – mu right).$$
$$frac{partial lnLleft( mu,sigma right)}{partialsigma} = frac{- 6}{sigma} + frac{1}{sigma^{3}}sum_{i = 1}^{6}left( ln x_{i} – mu right)^{2}.$$
The maximum likelihood estimators of $mu$ and $sigma$, denoted by $hat{mu}$ and $hat{sigma}$, are the solutions to the equations
$$frac{1}{{hat{sigma}}^{2}}sum_{i = 1}^{6}left( lnx_{i} – hat{mu} right) = 0.$$
$$frac{- 6}{hat{sigma}} + frac{1}{{hat{sigma}}^{3}}sum_{i = 1}^{6}left( ln x_{i} – hat{mu} right)^{2} = 0.$$
These yield the estimates

$hat{mu} = frac{sum_{i = 1}^{6}{ln x_{i}}}{6} = 9.38$ and
${hat{sigma}}^{2} = frac{sum_{i = 1}^{6}left( ln x_{i} – hat{mu} right)^{2}}{6} = 5.12$.

The second partial derivatives are

$frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmu^{2}} = frac{- 6}{sigma^{2}}$,
$frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmupartialsigma} = frac{- 2}{sigma^{3}}sum_{i = 1}^{6}left( ln x_{i} – mu right)$
and
$frac{partial^{2}text{lnL}left( mu,sigma right)}{partialsigma^{2}} = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}left( ln x_{i} – mu right)^{2}$.

To derive the covariance matrix of the mle we need to find the expectations of the second derivatives. Since the random variable $X$ is from a lognormal distribution with parameters $mu$ and $sigma$, then $text{lnX}$ is normally distributed with mean $mu$ and variance $sigma^{2}$.

$Eleft( frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmu^{2}} right) = Eleft( frac{- 6}{sigma^{2}} right) = frac{- 6}{sigma^{2}}$,

$Eleft( frac{partial^{2}text{lnL}left( mu,sigma right)}{partialmupartialsigma} right) = frac{- 2}{sigma^{3}}sum_{i = 1}^{6}{Eleft( ln x_{i} – mu right)} = frac{- 2}{sigma^{3}}sum_{i = 1}^{6}leftlbrack Eleft( ln x_{i} right) – mu rightrbrack$=$frac{- 2}{sigma^{3}}sum_{i = 1}^{6}left( mu – mu right) = 0$,

and

$Eleft( frac{partial^{2}text{lnL}left( mu,sigma right)}{partialsigma^{2}} right) = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}{Eleft( ln x_{i} – mu right)}^{2} = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}{Vleft( ln x_{i} right) = frac{6}{sigma^{2}} – frac{3}{sigma^{4}}sum_{i = 1}^{6}{sigma^{2} = frac{- 12}{sigma^{2}}}}$.

Using the negatives of these expectations we obtain the Fisher information matrix $begin{bmatrix}
frac{6}{sigma^{2}} & 0 \
0 & frac{12}{sigma^{2}} \
end{bmatrix}$.

The covariance matrix, $Sigma$, is the inverse of the Fisher information matrix $Sigma = begin{bmatrix}
frac{sigma^{2}}{6} & 0 \
0 & frac{sigma^{2}}{12} \
end{bmatrix}$.

The estimated matrix is given by $hat{Sigma} = begin{bmatrix}
0.8533 & 0 \
0 & 0.4267 \
end{bmatrix}$.

The 95% confidence interval for $mu$ is given by $9.38 pm 1.96sqrt{0.8533} = left( 7.57, 11.19 right)$.

The 95% confidence interval for $sigma^{2}$ is given by $5.12 pm 1.96sqrt{0.4267} = left( 3.84, 6.40 right)$.

The mean of *X* is $expleft( mu + frac{sigma^{2}}{2} right)$. Then, the maximum likelihood estimate of
$$gleft( mu,sigma right) = expleft( mu + frac{sigma^{2}}{2} right)$$
is
$$gleft( hat{mu},hat{sigma} right) = expleft( hat{mu} + frac{{hat{sigma}}^{2}}{2} right) = 153,277.$$

We use the delta method to approximate the variance of the mle
$gleft( hat{mu},hat{sigma} right)$.

$frac{partial gleft( mu,sigma right)}{partialmu} = expleft( mu + frac{sigma^{2}}{2} right)$
and
$frac{partial gleft( mu,sigma right)}{partialsigma} = sigma expleft( mu + frac{sigma^{2}}{2} right)$.

Using the delta method, the approximate variance of
$gleft( hat{mu},hat{sigma} right)$ is given by

$$left. hat{V}left( gleft( hat{mu},hat{sigma} right) right) = begin{bmatrix}
frac{partial gleft( mu,sigma right)}{partialmu} & frac{partial gleft( mu,sigma right)}{partialsigma} \
end{bmatrix}Sigmabegin{bmatrix}
frac{partial gleft( mu,sigma right)}{partialmu} \
frac{partial gleft( mu,sigma right)}{partialsigma} \
end{bmatrix} right|_{mu = hat{mu},sigma = hat{sigma}}$$

$= begin{bmatrix}
153,277 & 346,826 \
end{bmatrix}begin{bmatrix}
0.8533 & 0 \
0 & 0.4267 \
end{bmatrix}begin{bmatrix}
153,277 \
346,826 \
end{bmatrix} =$71,374,380,000

The 95% confidence interval for $expleft( mu + frac{sigma^{2}}{2} right)$ is given by

$153,277 pm 1.96sqrt{71,374,380,000} = left( – 370,356, 676,910 right)$.

Since the mean of the lognormal distribution cannot be negative, we should replace the negative lower limit in the previous interval by a zero.

3.5.2 Maximum Likelihood Estimators for Grouped Data

In the previous section we considered the maximum likelihood estimation of continuous models from complete (individual) data. Each individual observation is recorded, and its contribution to the likelihood function is the density at that value. In this section we consider the problem of obtaining maximum likelihood estimates of parameters from grouped data. The observations are only available in grouped form, and the contribution of each observation to the likelihood function is the probability of falling in a specific group (interval). Let $n_{j}$ represent the number of observations in the interval $left( left. c_{j – 1},c_{j} rightrbrack right. $ The grouped data likelihood function is thus given by
$$Lleft( theta right) = prod_{j = 1}^{k}leftlbrack Fleft( left. c_{j} right|theta right) – Fleft( left. c_{j – 1} right|theta right) rightrbrack^{n_{j}},$$
where $c_{0}$ is the smallest possible observation (often set to zero) and $c_{k}$ is the largest possible observation (often set to infinity).

Example 3.23 (SOA) For a group of policies, you are given that losses follow the distribution function $Fleft( x right) = 1 – frac{theta}{x}$, for $theta lt x lt infty.$ Further, a sample of 20 losses resulted in the following:
$$
{small begin{matrix}hline
text{Interval} & text{Number of Losses} \ hline
x leq 10 & 9 \
10 lt x leq 25& 6 \
x gt 25& 5 \hline
end{matrix}}$$ Calculate the maximum likelihood estimate of $theta$.

Solution

3.5.3 Maximum Likelihood Estimators for Censored Data

Another distinguishing feature of data gathering mechanism is censoring. While for some event of interest (losses, claims, lifetimes, etc.) the complete data maybe available, for others only partial information is available; information that the observation exceeds a specific value. The limited policy introduced in Section 3.4.2 is an example of right censoring. Any loss greater than or equal to the policy limit is recorded at the limit. The contribution of the censored observation to the likelihood function is the probability of the random variable exceeding this specific limit. Note that contributions of both complete and censored data share the survivor function, for a complete point this survivor function is multiplied by the hazard function, but for a censored observation it is not.

Example 3.24 (SOA) The random variable has survival function:
$$S_{X}left( x right) = frac{theta^{4}}{left( theta^{2} + x^{2} right)^{2}}.$$
Two values of $X$ are observed to be 2 and 4. One other value exceeds 4.
Calculate the maximum likelihood estimate of $theta$.
Solution

3.5.4 Maximum Likelihood Estimators for Truncated Data

This section is concerned with the maximum likelihood estimation of the continuous distribution of the random variable $X$ when the data is incomplete due to truncation. If the values of $X$ are truncated at $d$, then it should be noted that we would not have been aware of the existence of these values had they not exceeded $d$. The policy deductible introduced in Section 3.4.1 is an example of left truncation. Any loss less than or equal to the deductible is not recorded. The contribution to the likelihood function of an observation $x$ truncated at $d$ will be a conditional probability and the $f_{X}left( x right)$ will be replaced by $frac{f_{X}left( x right)}{S_{X}left( d right)}$.

Example 3.25 (SOA) For the single parameter Pareto distribution with $theta = 2$, maximum likelihood estimation is applied to estimate the parameter $alpha$. Find the estimated mean of the ground up loss distribution based on the maximum likelihood estimate of $alpha$ for the following data set:

Ordinary policy deductible of 5, maximum covered loss of 25 (policy limit 20)
8 insurance payment amounts: 2, 4, 5, 5, 8, 10, 12, 15
2 limit payments: 20, 20.

Solution

3.6 Concluding Remarks

In describing losses, actuaries fit appropriate parametric distribution models for the frequency and severity of loss. This involves finding appropriate statistical distributions that could efficiently model the data in hand. After fitting a distribution model to a data set, the model should be validated. Model validation is a crucial step in the model building sequence. It assesses how well these statistical distributions fit the data in hand and how well can we expect this model to perform in the future. If the selected model does not fit the data, another distribution is to be chosen. If more than one model seems to be a good fit for the data, we then have to make the choice on which model to use. It should be noted though that the same data should not serve for both purposes (fitting and validating the model). Additional data should be used to assess the performance of the model. There are many statistical tools for model validation. Alternative goodness of fit tests used to determine whether sample data are consistent with the candidate model, will be presented in a separate chapter.

3.5 Maximum Likelihood Estimation

3.5.1 Maximum Likelihood Estimators for Complete Data

3.5.2 Maximum Likelihood Estimators for Grouped Data

3.5.3 Maximum Likelihood Estimators for Censored Data

3.5.4 Maximum Likelihood Estimators for Truncated Data

3.6 Concluding Remarks

Further Readings and References