3.5 Maximum Likelihood Estimation

In this section we estimate statistical parameters using the method of maximum likelihood. Maximum likelihood estimates in the presence of grouping, truncation or censoring are calculated.

3.5.1 Maximum Likelihood Estimators for Complete Data

Pricing of insurance premiums and estimation of claim reserving are among many actuarial problems that involve modeling the severity of loss (claim size). The principles for using maximum likelihood to estimate model parameters were introduced in Chapter 2. In this section, we present a few examples to illustrate how actuaries fit a parametric distribution model to a set of claim data using maximum likelihood. In these examples we derive the asymptotic variance of maximum-likelihood estimators of the model parameters. We use the delta method to derive the asymptotic variances of functions of these parameters.

Example 3.21 Consider a random sample of claim amounts: 8,000 10,000 12,000 15,000. You assume that claim amounts follow an inverse exponential distribution, with parameter $theta$.

  1. Calculate the maximum likelihood estimator for $theta$.
  2. Approximate the variance of the maximum likelihood estimator.
  3. Determine an approximate 95% confidence interval for $theta$.
  4. Determine an approximate 95% confidence interval for $Pr left( X leq 9,000 right).$

Solution

Example 3.22 A random sample of size 6 is from a lognormal distribution with parameters $mu$ and $sigma$. The sample values are 200, 3,000, 8,000, 60,000, 60,000, 160,000.

  1. Calculate the maximum likelihood estimator for $mu$ and $sigma$.
  2. Estimate the covariance matrix of the maximum likelihood estimator.
  3. Determine approximate 95% confidence intervals for $mu$ and $sigma$.
  4. Determine an approximate 95% confidence interval for the mean of the lognormal distribution.

Solution

3.5.2 Maximum Likelihood Estimators for Grouped Data

In the previous section we considered the maximum likelihood estimation of continuous models from complete (individual) data. Each individual observation is recorded, and its contribution to the likelihood function is the density at that value. In this section we consider the problem of obtaining maximum likelihood estimates of parameters from grouped data. The observations are only available in grouped form, and the contribution of each observation to the likelihood function is the probability of falling in a specific group (interval). Let $n_{j}$ represent the number of observations in the interval $left( left. c_{j – 1},c_{j} rightrbrack right. $ The grouped data likelihood function is thus given by
$$Lleft( theta right) = prod_{j = 1}^{k}leftlbrack Fleft( left. c_{j} right|theta right) – Fleft( left. c_{j – 1} right|theta right) rightrbrack^{n_{j}},$$
where $c_{0}$ is the smallest possible observation (often set to zero) and $c_{k}$ is the largest possible observation (often set to infinity).

Example 3.23 (SOA) For a group of policies, you are given that losses follow the distribution function $Fleft( x right) = 1 – frac{theta}{x}$, for $theta lt x lt infty.$ Further, a sample of 20 losses resulted in the following:
$$
{small begin{matrix}hline
text{Interval} & text{Number of Losses} \ hline
x leq 10 & 9 \
10 lt x leq 25& 6 \
x gt 25& 5 \hline
end{matrix}}$$ Calculate the maximum likelihood estimate of $theta$.

Solution

3.5.3 Maximum Likelihood Estimators for Censored Data

Another distinguishing feature of data gathering mechanism is censoring. While for some event of interest (losses, claims, lifetimes, etc.) the complete data maybe available, for others only partial information is available; information that the observation exceeds a specific value. The limited policy introduced in Section 3.4.2 is an example of right censoring. Any loss greater than or equal to the policy limit is recorded at the limit. The contribution of the censored observation to the likelihood function is the probability of the random variable exceeding this specific limit. Note that contributions of both complete and censored data share the survivor function, for a complete point this survivor function is multiplied by the hazard function, but for a censored observation it is not.

Example 3.24 (SOA) The random variable has survival function:
$$S_{X}left( x right) = frac{theta^{4}}{left( theta^{2} + x^{2} right)^{2}}.$$
Two values of $X$ are observed to be 2 and 4. One other value exceeds 4.
Calculate the maximum likelihood estimate of $theta$.
Solution

3.5.4 Maximum Likelihood Estimators for Truncated Data

This section is concerned with the maximum likelihood estimation of the continuous distribution of the random variable $X$ when the data is incomplete due to truncation. If the values of $X$ are truncated at $d$, then it should be noted that we would not have been aware of the existence of these values had they not exceeded $d$. The policy deductible introduced in Section 3.4.1 is an example of left truncation. Any loss less than or equal to the deductible is not recorded. The contribution to the likelihood function of an observation $x$ truncated at $d$ will be a conditional probability and the $f_{X}left( x right)$ will be replaced by $frac{f_{X}left( x right)}{S_{X}left( d right)}$.

Example 3.25 (SOA) For the single parameter Pareto distribution with $theta = 2$, maximum likelihood estimation is applied to estimate the parameter $alpha$. Find the estimated mean of the ground up loss distribution based on the maximum likelihood estimate of $alpha$ for the following data set:

  • Ordinary policy deductible of 5, maximum covered loss of 25 (policy limit 20)
  • 8 insurance payment amounts: 2, 4, 5, 5, 8, 10, 12, 15
  • 2 limit payments: 20, 20.

Solution

3.6 Concluding Remarks

In describing losses, actuaries fit appropriate parametric distribution models for the frequency and severity of loss. This involves finding appropriate statistical distributions that could efficiently model the data in hand. After fitting a distribution model to a data set, the model should be validated. Model validation is a crucial step in the model building sequence. It assesses how well these statistical distributions fit the data in hand and how well can we expect this model to perform in the future. If the selected model does not fit the data, another distribution is to be chosen. If more than one model seems to be a good fit for the data, we then have to make the choice on which model to use. It should be noted though that the same data should not serve for both purposes (fitting and validating the model). Additional data should be used to assess the performance of the model. There are many statistical tools for model validation. Alternative goodness of fit tests used to determine whether sample data are consistent with the candidate model, will be presented in a separate chapter.

Further Readings and References

  • Cummins, J. D. and Derrig, R. A. 1991. Managing the Insolvency Risk of Insurance Companies, Springer Science+ Business Media, LLC.
  • Frees, E. W. and Valdez, E. A. 2008. Hierarchical insurance claims modeling, Journal of the American Statistical Association, 103, 1457-1469.
  • Klugman, S. A., Panjer, H. H. and Willmot, G. E. 2008. Loss Models from Data to Decisions, Wiley.
  • Kreer, M., Kızılersü, A., Thomas, A. W. and Egídio dos Reis, A. D. 2015. Goodness-of-fit tests and applications for left-truncated Weibull distributions to non-life insurance, European Actuarial Journal, 5, 139–163.
  • McDonald, J. B. 1984. Some generalized functions for the size distribution of income, Econometrica 52, 647–663.
  • McDonald, J. B. and Xu, Y. J. 1995. A generalization of the beta distribution with applications, Journal of Econometrics 66, 133–52.
  • Tevet, D. 2016. Applying generalized linear models to insurance data: Frequency/severity versus premium modeling in: Frees, E. W., Derrig, A.
    R. and Meyers G. (Eds.) Predictive Modeling Applications in Actuarial Science Vol. II Case Studies in Insurance. Cambridge University Press.
  • Venter, G. 1983. Transformed beta and gamma distributions and aggregate losses. Proceedings of the Casualty Actuarial Society 70: 156–193.

[raw] [/raw]