Function validation

In the dcr module, we have included the validation() function, which provides a collection of validation techniques that we will apply throughout this book. The function provides an output of four panels:

  • Upper left panel: summary table of validation metrics;
  • Upper right panel: real-fit diagram by time, i.e., a comparison of the average observed and fitted outcome variable over time;
  • Lower left panel: histogram of the fitted values;
  • Lower right panel: real fit diagram by deciles of the fitted values.

This summary includes the visuals and metrics that we find most useful in validating models for default, payoff, loss rates given default and exposures.

The function validation() has three input arguments:

  • Outcome variable (may be binary or metric);
  • Fitted variable (may be binary or metric);
  • Time variable.

All variables may be numpy arrays or pandas data frames. In a perfect PD model the fitted values are equal to the observed outcomes and the validation function provides the following output:

default_time = data.default_time.values
time = data.time.values

validation(default_time, default_time, time)

In a perfect model, mean outcomes and mean fitted values match each other and performance measures are one, distance measures are zero and p-values are high as the hypothesis can not be rejected.

The summary table (upper left panel) reports observation counts, means for outcomes and fitted values, discrimination and calibration measures. Calibration measures are separated into R-squared, distances and p-values. The following metrics are included:

  • Counts: number of observations;
  • Mean outcome: average of outcome variable;
  • Mean fit: average of fit variable if this value is close to the mean outcome then this is a good indication for calibration;
  • AUC: area under the ROC, values are between zero (low fit) and one (high fit);
  • OLS R-squared: this is the R-squared of an OLS regression of the outcome on the fit variable. The number is equal to the square of the Pearson correlation coefficient. Values are between zero (low fit) and one (high fit);
  • scikit-learn R-squared: coefficient of determination as described above. Values are between minus infinity (low fit) and plus one (high fit);
  • RMSE/SQR(Brier score): square root of the mean of the differences between outcome and fit variable;
  • Log-loss: negative log-likelihood. Values are positive. The greater the value, the lower the fit;
  • P-values of the Binomial test. The null hypothesis is that the PD is lower than or equal to the average predicted PD for the entire sample. Values are positive. The greater the value, the lower the chance of rejecting the null hypothesis;
  • P-values of the Jeffrey’s prior test whether the PD is lower than or equal to the average predicted PD for the entire sample. Values are positive. The greater the value, the lower the chance of rejecting the null hypothesis.

The real-fit diagram by time (upper right panel) shows that the default rate and mean fitted values match perfectly as they fully overlap.

The histogram (lower left panel) shows that the fitted values are either zero (higher number of observations) or one (lower number of observations).

The real-fit diagram by deciles of the fit variable (lower right panel) shows that the default outcomes and deciles of fitted values match perfectly.

Validation Study

Here is a validation example for a logistic regression usoing the following features:

  • Liquidity (cep_time);
  • Equity (equity_time);
  • Loan contract rate (interest_rate_time);
  • FICO credit score (FICO_orig_time);
  • GDP growth rate (gdp_time);
  • Principal component 1 (PCA1_1);
  • Principal component 2 (PCA2_1);
  • Principal component 3 (PCA3_1);
  • Principal component 4 (PCA4_1);
  • Principal component 5 (PCA5_1);
  • Cluster 1 (cluster_1).

These features include scaled features, the first five principal components from an analysis of the state level default rates and one cluster dummy from K-Means clustering with two clusters. Note, one cluster is a reference category, indicated by the other dummy variable equaling zero.

All future estimated coefficients are ordered accordingly. We show the shape (number of observations and features) and default rate for the training and test dataset.

model_lr1 = LogisticRegression(penalty='none', fit_intercept=True, solver='saga', tol =1e-15, max_iter=10000)
model_lr1.fit(X_train_scaled, y_train)

print('Coefficients:', model_lr1.coef_.round(decimals=4))
print('Intercept', model_lr1.intercept_.round(decimals=4))
Coefficients: [[-0.2817 -0.2565  0.2965 -0.8004  0.0186 -0.0263  0.09    0.0147 -0.0978
  -0.056  -0.2978]]
Intercept [-4.9873]

Next, we compute in-sample (in-time) predictions for the training data, and out-of-sample (out-of-time) predictions for the test data.

For the training data we obtain a detailed report for discrimination and PD predictions. The time-series real-fit chart shows a good fit of average predictions to realized default rates. The R-squared is slightly greater than zero and both tests do not reject the null hypothesis. The calibration curve shows that the average predictions are close to the default rates by decile.

predictions_train = model_lr1.predict_proba(X_train_scaled)[:,1].T

validation(predictions_train, y_train, data_train.loc[:,'time'].values)