Probability of default modelling using logistic regression

roysaikat98

New Member
Hi David,

I am using a dataset to model probability of default which looks like this:
ID date2 default_1y_sco ratio1 ratio2
1 1 0 0.2 0.5
1 2 0 0.3 0.6
1 3 0 0.8 0.2
2 1 0 0.1 0.3
2 2 1 0.9 0.1
3 5 0 0.1 0.7
4 3 0 0.5 0.1
4 8 1 0.2 0.4

In the dataset,
ID - account ID
date2 - yearly reference date (range of values 1-8)
default_1y_sco - Default Flag (1 if customer defaults during the 12 months observation period after the reference date, 0 otherwise
ratio variables - explanatory variables

Can I use a simple logistic regression to predict the probability of default for each account and reference date combination, or am I missing anything out here?
 

Attachments

  • Eher2.png
    Eher2.png
    12 KB · Views: 3

Lu Shu Kai FRM

Well-Known Member
Hi @roysaikat98 ,

Hopefully I understand your question, the PD is the dependent variable and the independent variables are ratios. Since you are asking about the computing a PD model for each account and reference date combination, I assume that leaves us with 8 different LG models from your picture alone?

Although I would heavily advise against going with these models as there seems to be a lack of data from your picture alone. If you have more data, it would probably be alright to build these models. From my experience though, it might not be a very good model. In reality, PD models can be extremely complicated and even those well-established PD models themselves are not completely accurate. For example, https://nuscri.org/en/home/ has their own proprietary PD model.
 
Top