Actuarial
  • Articles
  • March 2023

Confusing the Matrix: Actuarial adaptations to a well-known construct

Woman viewing binary code
In Brief
In a Society of Actuaries , 国标麻豆视频APP Ryan Holt discusses the confusion matrix and how actuaries have leveraged it in the accelerated underwriting (AU) space. To learn more, listen to the SOA's follow-up . 

I have some things I need to get off my chest related to this construct and how actuaries, including me, have leveraged it in the accelerated underwriting (AU) space. First off, let鈥檚 address where the confusion matrix came from. Actuaries who have been involved in predictive analytics should be quick to recognize that a confusion matrix is an essential tool in machine-learning algorithms. Medical studies have used the confusion matrix to determine the efficacy of different diagnostic tests.

On the surface, a confusion matrix is nothing more than a table showing counts of actual and predicted results for a given number of categories. However, using this matrix, other disciplines have created a variety of metrics to help choose between different models or evaluate the performance of a diagnostic test. For those interested in a very straightforward summary related to machine learning methods and analysis, including the confusion matrix, see Jeff Heaton鈥檚 article 鈥 in the July 2016 issue of the Predictive Analytics and Futurism newsletter.

How did the confusion matrix make it to the AU space? Some might assume that AU uses machine learning classification models and therefore, actuaries are using them to help determine the most appropriate model to classify risks. While this can be true for some programs, the more widespread use of the confusion matrix is tied to AU auditing programs. For the purposes of this article, I鈥檓 going to assume that the reader is familiar with random holdout (RHO) and post-issue attending physician statement (APS) audits. For background on these topics, you can read 鈥溾 in the July 2019 issue of Product Matters! and authored by Taylor Pickett and me.

So, why did we start using the confusion matrix to summarize AU audit results? Audit results are comprised of cases that have been underwritten twice鈥攐nce using a more traditional method and then again using the accelerated process. This data structure, an applicant with two observed outcomes, seemed to fit the confusion matrix construct well with the traditional underwriting decision being the 鈥渁ctual鈥 outcome and the accelerated decision the 鈥減redicted鈥 outcome. One wrinkle here is that a confusion matrix traditionally has the same number of rows and columns. In other words, the list of actual categories is identical to the list of predicted categories. For AU audits, we typically have a situation where certain actual risk classes are not available as a predicted risk class. This is because AU decisions are typically limited to standard or better risks, whereas the actual risk class decision can be substandard or even a decline.

With the actual and predicted risk classes defined, categorical asymmetry aside, the confusion matrix was a natural fit for summarizing results from an audit sample. So, now we can leverage all those metrics that other disciplines have created for analyzing model performance, right? Wrong. Unfortunately, most of the metrics that have been developed by other disciplines have been largely ignored by actuaries in our analysis of audit results. The reason for this is two-fold. First, many of the metrics that have been developed are used in choosing between multiple models. In the case of an audit sample, we only have one AU process, and the result is what it is; there is no need to compare the metric to an alternate process. Secondly, one of the main uses of audit results is to try and quantify the mortality impact for an AU program. This problem is unique to our discipline, and as such, it has spurred the creation of other metrics and calculations, which have a variety of assumptions associated with them. It is those assumptions that I would like to spend the remainder of this article discussing.

Mortality Impact

Now that we鈥檝e established the origins of the confusion matrix and how actuaries have borrowed it for the purposes of AU, let鈥檚 dig into the assumptions behind calculating the mortality impact from audit results. In particular, let鈥檚 talk about relative mortality, assigning actual audit results, and lastly, on-top adjustments. Let me start by giving a short definition of what I mean by mortality impact. At the policy level, mortality impact is simply the relative mortality for the actual class divided by the relative mortality for the predicted class.

Table 1:
Relative Mortality
Risk ClassRelative Mortality
Best80%
Preferred100%
Standard120%
Decline480%
Table 2: 
Mortality Impacts
Predicted
ActualBestPreferredStandard
Best100%80%67%
Preferred125%100%83%
Standard150%120%100%
Decline600%480%400%

In Tables 1 and 2, you can see an illustrative example of relative mortalities and the associated mortality impacts for a selection of actual and predicted classes. With this definition of mortality impact, the result is a factor that could be applied to our pre-AU mortality expectation to adjust for the impact from misclassification associated with AU. It is also possible for these factors to be less than 100 percent, implying that sometimes the AU decision may be more conservative than what traditional underwriting would have been. With each audit case assigned a mortality impact, you can then determine the mortality impact for any given cohort, such as by predicted class, by summing up the mortality impacts for that cohort and dividing by the number of observations in that cohort (a simple average). You can also weight the mortality impact by face amount to get a view of mortality impact by amount. I鈥檇 like to point out here that with this methodology, we can calculate mortality impact at a policy level and in aggregate without even referencing a confusion matrix. Thus, a confusion matrix is a useful way to visualize audit results and provide some shortcuts to the calculations for mortality impact, but it is not synonymous with mortality impact nor necessary to calculate it.

Relative Mortality

With mortality impact outlined, the first question becomes what is the relative mortality? In simplest terms, relative mortality is the mortality outcome of one group relative to the mortality outcome of a reference group. For life insurance, the reference group is typically aggregate standard (non-rated) mortality. In terms of a traditional mortality study, one could create relative mortality percentages by taking actual to expected ratios for each risk class and divide them by the overall actual to expected ratio (excluding substandard risks). This shows how each risk class performed, on average, relative to the overall result of the study.

When it comes to determining relative mortalities, there are a few key considerations. The source for relative mortality could be from a mortality study as I just mentioned, or it could be from an internal mortality assumption. When using mortality assumptions, should you use the duration 1 difference between your preferred classes or take an actuarial present value? An important consideration with duration 1 differences is that they may vary by age, as seen with the 2015 VBT RR tables. It鈥檚 very common for AU demographics to skew toward younger ages, so should you adjust the duration 1 relative mortality to account for this? Fortunately, there are no wrong answers here. These are instead items that should be discussed by the actuaries and underwriters involved in the AU program to ensure all are comfortable with the estimated mortality impacts.

With actuaries keenly dialed into mortality assumption setting, determining the relative mortality for standard or better classes can be straightforward. Declines, on the other hand, is one area where we don鈥檛 typically have mortality experience. We need a relative mortality assumption for declines because it is one of the possible 鈥渁ctual鈥 results for our audit cases. The question is, what is the mortality for a declined case? There are a few factors that I think about when considering how to set the relative mortality assumption for declines. Number one, what is the maximum table rating that your company issues? Number two, what are common reasons for cases being declined? Number three, what is the mix (if known) of tobacco and non-tobacco applicants for declined cases? The maximum table rating helps set the stage for what the highest possible issued medical risk might look like. Because not all decline reasons are medical, looking at the common decline reasons could lead you to pull back your assumption from the maximum table rating. Taking the maximum table rating, adjusted for decline reasons, and blending based on an assumed mix of nontobacco/tobacco users, we can land on an estimate for the relative mortality of declines.

Actual Results

I鈥檇 like to turn my attention now to assigning the 鈥渁ctual鈥 audit results, specifically as it relates to post-issue APS audits. In determining the mortality impact as I鈥檝e defined it, we have an actual and predicted relative mortality. Relative mortalities for 鈥渁ctual鈥 decisions are grounded in the risk class decisions associated with how underwriting has been performed historically for those classes. For full underwriting at the ages and face amounts typical of AU, these decisions are usually based on a paramedical exam and an insurance lab panel. An APS, while possible, is not a typical requirement for the core ages and amounts associated with AU.

With this understanding, the question becomes, can an underwriter recreate the decision they would make with a paramedical exam and labs using just an APS? For the purposes of calculating our mortality impact, we assume this to be yes. In practice, however, we know that an APS could have more information or less information than a traditional exam and labs. This will lead to discrepancies between what the 鈥渁ctual鈥 decision would be using traditional evidence vs. the 鈥渁ctual鈥 decision using the APS. One saving grace here is that these discrepancies tend to go both ways, and in total, they could net out to no impact overall. For this assumption to have the best chance at holding together, underwriters performing audits should use the APS to re-underwrite the case using all the information available to them. If post-issue APS audits are exclusively used to check for material misrepresentation, then it is less likely that audit results would tie back to the actual class needed to calculate the mortality impact.

Outside Adjustments

Lastly, I鈥檇 like to talk about adjustments to mortality impact that occur on top of the impact from misclassification. Tim Morant and Philip Janz wrote an article titled 鈥 in the July 2019 issue of Reinsurance News. In this article, they illustrate how some tools, like credit-based mortality scores, can identify risks that have better mortality outcomes relative to their risk class. For example, preferred risks with a score below a given score threshold may exhibit mortality outcomes of 90 percent of all preferred risks, while preferred risks over that threshold would exhibit mortality outcomes that are 120 percent of all preferred risks (see Graph 3 from their article for a clear example of this). If this threshold is part of how applicants are selected to be accelerated, should we include the impact from these scores in the mortality impact calculation? One item to note is that the two impacts together should balance out in total. The threshold is just a way of further segmenting the policies within a given risk class; we have not created or removed any mortality just by dividing up the preferred class (although this may change if placement rates vary by accelerated status). If you do choose to include this type of impact, you could simply multiply the actual relative mortality by the anticipated adjustment related to the risk selection tool. The important thing to recognize, however, is that you must reflect both sides of this impact, the upside and downside. In the case of audit results, we鈥檙e typically only thinking about the upside, or applying a discount based on selecting the best risks for acceleration. If our estimate of the mortality impact from the audits includes this adjustment, we must then apply the residual load to our expectation for cases that are not accelerated.

Conclusion

Mortality impact as I鈥檝e discussed here is very dependent on the relative mortality that has been cultivated through years of fully underwritten business. These relative mortalities are based on the average result of any given risk class, and there can be a range of mortality outcomes within any given risk class. With underwriting programs adopting new evidence such as medical claims, clinical labs, electronic health records, and who knows what else; will we start to shift our underwriting outcomes, such that the prevailing risk class decisions no longer align with those that created the historical relative mortality outcomes? Put another way, what if the evidence used in a new process can better classify mortality compared to our prior underwriting practices, thus narrowing, or even shifting the range of results for a given class? If this is the case, how do we determine relative mortality and mortality impacts for new underwriting programs? Personally, I believe that this is a likely outcome, and just as actuaries borrowed the confusion matrix to assist with accelerated underwriting, we will need to adapt and create new ways to solve this evolving challenge.


To learn more, listen to the SOA's follow-up . 

More Like This...

Meet the Authors & Experts

Ryan Holt
Author
Ryan LaMar Holt
Actuary, U.S. Individual Life, RGA

Additional Resources

Posted with permission of the 漏Society of Actuaries, Schaumburg, Illinois.