By: Sian Townson
This article first appeared in Harvard Business Review on November 6, 2020.
As banks increasingly deploy artificial intelligence tools to make credit decisions, they are having to revisit an unwelcome fact about the practice of lending: Historically, it has been riddled with biases against protected characteristics, such as race, gender, and sexual orientation. Such biases are evident in institutions’ choices in terms of who gets credit and on what terms. In this context, relying on algorithms to make credit decisions instead of deferring to human judgment seems like an obvious fix. What machines lack in warmth, they surely make up for in objectivity, right?
Sadly, what’s true in theory has not been borne out in practice. Lenders often find that artificial-intelligence-based engines exhibit many of the same biases as humans. They’ve often been fed on a diet of biased credit decision data, drawn from decades of inequities in housing and lending markets. Left unchecked, they threaten to perpetuate prejudice in financial decisions and extend the world’s wealth gaps.
The problem of bias is an endemic one, affecting financial services start-ups and incumbents alike. A landmark 2018 study conducted at UC Berkeley found that even though fintech algorithms charge minority borrowers 40% less on average than face-to-face lenders, they still assign extra mortgage interest to borrowers who are members of protected classes. Recently, Singapore, the United Kingdom, and some European countries issued guidelines requiring firms to promote fairness in their use of AI, including in lending. Many aspects of fairness in lending are legally regulated in the United States, but banks still have to make some choices in terms of which metrics for fairness should be prioritized or de-prioritized and how they should approach it.
So how can financial institutions turning to AI reverse past discrimination and, instead, foster a more inclusive economy? In our work with financial services companies, we find the key lies in building AI-driven systems designed to encourage less historic accuracy but greater equity. That means training and testing them not merely on the loans or mortgages issued in the past, but instead on how the money should have been lent in a more equitable world.
The trouble is that humans often cannot detect the unfairness that exists in the massive data sets that machine-learning systems analyze. So lenders increasingly rely on AI to identify, predict, and remove the biases against protected classes that are inadvertently baked into algorithms.
Remove bias from data before a model is built.
An intuitive way to remove bias from a credit decision is to strip discrimination from the data before the model is created. But this requires more adjustment than simply removing data variables that clearly suggest gender or ethnicity, as previous bias has effects that ripple throughout. For example, samples of loan data for women are usually smaller because, proportionally, financial institutions have approved fewer and smaller loans to women in decades past than to men with equivalent credit scores and income. This leads to more frequent errors and false inferences for the under-represented and differentially treated female applicants. Manual interventions to attempt to correct the bias in data can also end up in self-fulfilling prophecies, as mistakes or assumptions made may be repeated and amplified.
Algorithms can replicate historical biases — but they can also be used to correct them
To avoid this, banks can now use AI to spot and correct patterns of historic discrimination against women in raw data, compensating for changes over time by deliberately altering this data to give an artificial, more equitable probability of approval. For example, by using AI, one lender discovered that, historically, women would need to earn 30% more than men on average for equivalent-sized loans to be approved. It used AI to retroactively balance the data that went into developing and testing its AI-driven credit decision model by shifting the female distribution, moving the proportion of loans previously made to women to be closer to the same amount as for men with an equivalent risk profile, while retaining the relative ranking. As a result of the fairer representation of how loan decisions should have been made, the algorithm developed was able to approve loans more in line with how the bank wished to extend credit more equitably in the future.
Pick better goals for models that discriminate.
Yet even after data is adjusted, banks can often need an extra layer of defense to prevent bias, or remaining traces of its effects, from creeping in. To achieve this, they “regularize” an algorithm so that it aims not just to fit historical data, but also to score well on some measure of fairness. They do this by including an extra parameter that penalizes the model if it treats protected classes differently.
For example, one bank discovered by applying AI that very young and very old applicants were not getting equal access to credit. To encourage fairer credit decisions, the bank designed a model that required its algorithm to minimize an unfairness score. The score was based on the gap between outcomes for people in different age brackets with the same risk profile, including intersections between subgroups, such as older women. By taking this approach, the final AI-driven model could close the mathematical gap between how similar people from different groups are treated by 20%.
Introduce an AI-driven adversary.
Even after correcting the data and regularizing the model, it is still possible to have an apparently neutral model which continues to have a disparate impact on protected and non-protected classes. So many financial institutions go one more step and build an additional, so-called “adversarial” AI-driven model to see if it can predict protected-class bias in decisions made by the first model. If the adversarial challenger successfully detects any protected characteristic such as race, ethnicity, religion, gender, sexuality, disability, marital status or age, from the way the first credit model treats an applicant, then the original model is corrected.
For example, adversarial AI-driven models can often detect ethnic minority zip codes from the outputs of a proposed credit model. This can often be due to a confounding interaction with lower salaries being associated with overlapping zip codes. Indeed, we have seen adversarial models show that an original model is likely to offer lower limits to applications from zip codes associated with an ethnic minority, even if the original model or data available did not have race or ethnicity as an input to check against.
In the past, these issues would have been dealt with by attempting to manually change the original model’s parameters. But now we can use AI as an automated approach to re-tune the model to increase the influence of variables which contribute to equity and reduce those that contribute to bias, partially by aggregating segments, until the challenger model is no longer able to predict ethnicity by using zip codes as a proxy. In one instance, this resulted in a model that still differentiated between zip codes but reduced the mortgage approval rate gap for some ethnicities by as much as 70%.
To be sure, financial institutions should lend wisely, based on whether people are willing and able to pay debt. But lenders must not treat people differently if they have similar risk profiles, whether that decision is made by artificial neural networks or by human brains. Reducing bias is not just a socially responsible pursuit — it also makes for more profitable business. The early movers in reducing bias through AI will have a real competitive advantage on top of doing their moral duty.
Algorithms can’t tell us which definitions of fairness to use or which groups to protect. Left to their own devices, machine-learning systems may cement the very biases we want them to eliminate.
But AI need not go unchecked. Armed with a deeper awareness of bias lurking in the data and with objectives that reflect both financial and social goals, we can develop models that do well and that do good.
There is measurable evidence that lending decisions based on machine-learning systems vetted and adjusted by the steps outlined above are fairer than those made previously by people. One decision at a time, these systems are forging a more financially equitable world.
This article is posted with permission of Harvard Business Publishing. Any further copying, distribution, or use is prohibited without written consent from HBP - firstname.lastname@example.org.