4 min read

Propensity scoring and Inverse probability weighting (IPW)

Introduction and Business Understanding

We will use propensity scoring and inverse probability weighting (IPW) to obtain an unbiased effect estimate the magnitude of a potential gender and race bias in FHA-backed loans. The question is what other unknown confounders could explain the differences and prevent us from explaining gender and race bias casually. We start by using the excel file “HMDA data,” which we import to JMP.

Data Understanding

The HMDA data for Louisiana are used for this analysis. Gender and Race are considered treatments. We have a gender variable, and we want to know if Gender has any effect on the loan being denied. We have a couple of co-variables they are falling into three groups, and we need to determine to which group it belongs. The leading group that we are interested in is confounders. It affects Gender (treatment) and denied (outcome). Then we have blocking variables that only affect development and reduce noise in the regression module—instrumental variables that only affect the treatment but not the outcome. Denied is a binary variable for that we need logistic regression to run the relationship between Gender and denied. To get unbiased estimates, we use inverse probability weighting, it requires us to create weights based on the inverse probabilities, treatment. We run a regression model of cofounders to predict Gender, which will give us possibilities, logistic regression using confounders as independent variables, and Gender is the dependent variable. Then, we take the inverse of it, weighted and compute the weighting average, which gives us unbiased estimates provided by confounders.

Analysis

After importing our excel file into jmp, we can mention that Denied = 1, undenied = 0. “County_fips” and “tract” we making it nominal. Also, we exclude 16 missing values, so as a couple of outliers. In the total of 46 rows, we excluded.

GENDER

In the first step, we make “denied” as a nominal variable, and then we use Tabulate to get frequencies of “denied” for females and males, risk ratio, and differences. Copy to the excel to look at significance. Appendix A. The usual t value of the large sample size is 1.96, and if the t value is significant than 2, it will reject the null hypothesis. Those two proportions are equal. Therefore, there is a statistical significance (less 0.05) difference between the two proportions at an average of 3%. We have confounders in our data set; we now want to reduce confounding and get better estimates for the bias difference, nit my Gender, by some other factors we have collected. To make it fewer bias estimates, we have to compute the inverse probability weighting averages.

The next step would be to prepare for building a model to predict Gender using the confounders. Since we don’t know the confounders yet, we will run the regression model of all variables: “instrumental,” “blocking,” “confounder” on “gender,” and another regression model on “denied.” Only the predictors that will be statistically significant at 0.05 in the 0 regression models are confounders, and we have to include them in the model to predict “gender.” The first regression model will be on “SexDescription” – Gender, the second regression model, will be on “Denied,” Appendix B. We combined all models as suggested in the excel sheet. “Race” is statistically significant (C), “property type,” “med income,” “income,” and “loan amount” are also statistically significant. Therefore they are confounders. Something like lean would be a blocking (B) variable as one of the factors is not statistically significant.

The last step would be to run a model on “SexDescroption” Appendix C. including confounders to obtain the probabilities of the wight, which are inverse of the possibilities. Finally, we save the probability formula and create a new column with IPW-Sex-gender to compute averages in the Tabulate. Appendix D. We made “denied” back as a continuous.

95% confidence intervals. It tells us that there is no statistically significant difference anymore between Gender. Unbiased estimates using other confounders to create an inverse probability-weighted estimate that gives us an impartial assessment and shows us that there is no gender affecting us.

RACE

Next, we repeat precisely the same steps, but for the “Race,” it will reject the null if the t value is more significant than 2. Therefore there is no statistical significance on “Race.” Appendix E. Next, we look at confounders if there is some. We combined all models again in the excel sheet Appendix F. Race is not statistically significant and more extensive than 0.05. However, three variables are statistically significant, which are “property type,” “min pop,” and “income,” which are confounders. Next, we run a regression model again, but for the “Race” with three confounders to obtain the probabilities of the wights again, Appendix G.

Create a new column with probability formula- IPW-RACE to compute averages in the Tabulate. We made “denied” back as a continuous. Appendix H. It tells us that there is still a statistically significant difference between Races.

95% confidence intervals. There is not much relationship between variables “Race” and “denied,” unbiased estimates using other confounders to create an inverse probability-weighted estimate that gives us a biased estimate and shows us that Race still has some effect, which probably means Gender.

Conclusion

The question was what other unknown confounders could explain the differences and prevent us from explaining gender and race bias casually. After running two separate models

on “Gender” and “Race,” we found another potential confounder, “MinorityPop,” from the last run, and two other confounders were the same as in the Gender model.

From the excel results, we can conclude that Gender has more effect on being loan denied rather than Race effect. There is not much relationship between variables “Race” and “denied,” unbiased estimates using other confounders to create an inverse probability-weighted assessment that gives us a biased estimate and shows us that Race still has some effect, and Gender not. The Race has a statistically significant impact on Denied, but Gender does not.

Research Provided by Andrey Fateev

Appendix A. Female vs. Male SF. GENDER