Automated Decision Making: A hidden blessing for uncovering systemic bias?

Commentary

If executed properly, systematic audits of automated decision making can not only provide a mechanism to keep algorithmic systems “in check”, but may even provide systems and institutions an opportunity to better understand, and eventually overcome, the structural and human biases inherent to their existing practices.

CAPTCHA

Introduction

As automated analysis of data (via a suite of tools known as machine learning) becomes more prominent, examples of undesirable outcomes abound. Notable examples, which are part of a much longer and ever accruing list, include Google Photos mistakenly labeling a black couple as “gorillas” and Apple card assigning women worse credit limits than their husbands, despite higher credit scores or incomes. This has led to the formation of a research agenda around “fair machine learning”, with the hopes of setting up an understanding of “best-practices” that will allow such behaviors to be prevented to begin with rather than discovered post-deployment.

The story of Amazon is another interesting example of late, where an attempt was made to use machine learning to automate their hiring processes, only to scratch the resulting tool upon the discovery that it accidentally penalizes resumes which include the term “women’s” in it (such as resumes of candidates who went to an all-women’s college).

Amazon’s hiring tool is, first and foremost, an example for why we should not simply “invoke” machine learning in sensitive domains, without placing incredible care and attention in making sure the resulting models do not reproduce existing patterns of discrimination, inherit the prejudice of prior decision makers, or simply reflect the widespread biases that persist in society.  But what if there was also a different lesson in store?

Establishing discrimination in the presence of automated decision making

Machine learning is an umbrella term for a collection of techniques for fitting statistical models to complex data. This means that the unfair patterns that emerged in the process of fitting a statistical model to Amazon’s historical hiring data are merely a reflection of Amazon’s real hiring practices in the decades that preceded the development of the tool.

We can therefore entertain the existence of two different worlds  – world A in which Amazon screens candidates as usual, and world B in which it uses the help of the automated tool to do so. This would allow for the examination of how determining and addressing the existence of discrimination would proceed in each world. For instance, consider Alice, an American woman who believes she has been discriminated against by Amazon’s hiring process. If Alice were to file for discrimination, how would her claim proceed? Would this answer differ based on whether we are in world A or world B?

Generally speaking, filing for discrimination under a Title VII disparate treatment case would require Alice to demonstrate that a similarly situated person, who is not a member of a protected class, would not have suffered the same fate. Presumably, in this case, that a man of similar qualifications to Alice would generally pass the resume screening stage at Amazon.

Probing a single black box may be easier than uncovering human biases in complex systems

In the absence of explicit disparaging remarks made by an Amazon employee directly to Alice, she would presumably have difficulty corroborating circumstantial evidence of discriminatory intent. Even in the unlikely event that this were the case, Amazon could always attribute the incident to the one particular individual that mistreated Alice – sometimes known as accountability. Statements such as, “the actions of X do not represent our company policies”, provide coverage for a large company such as Amazon, who can quickly excuse itself from any accountability of an unjust or illegal decision. Indeed – and somewhat luckily for Amazon – in world A, the entity that is “Amazon’s hiring process” isn’t one unified entity with a prescribed agenda (be it discriminatory or not). It is a hierarchy of recruiters, hiring managers and interviewers, each with their own biases and prejudices. In other words, even if Alice could miraculously establish that she encountered discrimination within her hiring process, the potentially systematic aspects at play would likely go unnoticed.

In world B, things are different. By “transforming” all of Amazon’s historical hiring decisions into a single algorithm, Amazon has willfully created some well-specified mapping, from inputs (individuals) to outcomes (hiring decisions), that purportedly represent Amazon’s notion of what a “successful” employee is. This mapping could be arbitrarily complex – a “black box” for all we know – but probing one such black box is potentially much easier than probing the decisions made by hundreds of different human decision makers. Put differently, by training a ML algorithm on its historical hiring decisions, Amazon captured the systematic aspects of their hiring procedure, specifically those aspects that go beyond the idiosyncrasies of one hiring manager or another.

In the case that Amazon would use an automated hiring tool, tasks required for establishing discrimination become better defined. For example, given access to the model, one could examine how its behavior differs when it is “fed” different inputs (individuals, potentially completely fictitious ones) into the black-box that it represents. Continuing the above example, this could be used by Alice to demonstrate that an equally qualified male candidate would have received more favourable outcomes. In this way, auditing the algorithmic tool can give a much stronger basis to hypothesize about the discrimination present in Amazon’s past hiring practices. Indeed, in Amazon’s case, it was revealed that aspects of the hiring process were discriminatory towards some candidates – a claim that was significantly harder to establish prior to the use of the algorithmic tool.

A path forward: platforms for systematic audits of automated decision making

Of course, this ability to uncover biases in automated decision making processes only stands in the fictitious world in which society has complete access to the algorithmic tools in question. While full access to models of corporate companies is not a reasonable objective due to reasonable claims regarding copyright and intellectual property, there is a sufficiently good alternative.  Regulators should strive towards clear procedures for external auditing of algorithmic tools in order to create accountability of automated decision making systems.  Decisions made by or informed by automated tools should either be transparent to the individuals affected by them, or individuals must know that the tool itself went under the scrutiny of the appropriate experts, carefully inspected across a host of concerns, such as privacy and non-discrimination.

The challenge requires progress on multiple fronts, namely from the perspective of policy makers, regulators, and the interdisciplinary academic community, in order to establish appropriate standards for what such auditing procedures should look like. Although this is a far-reaching endeavour, if executed properly, it can not only provide a mechanism to keep algorithmic systems “in check”, but may even provide systems and institutions an opportunity to better understand, and eventually overcome, the structural and human biases inherent to their existing practices.


The opinions expressed in this text are solely that of the author/s and do not necessarily reflect the views of  the Heinrich Böll Stiftung Tel Aviv and/or its partners.