if you think principles is the way to regulate AI behavior, you can code them:
O obligatory
P permissible
Week 2
Biases in AI
Back in 2009, the example of developing technology at the time was behaving differently among users
“HP computers are racist”.
It recognizes the whit face but not the black face.
Very basic example that shows how baises can exist in AI:
- Crime Forecasts
Risk score according to COMPAS (software)
There’s a software used across the country to predict future criminals (in criminal sentencing), and it’s
biased against blacks.
Examples of unfair predictions between white and black defendants.
The questionnaire doesn’t ask about the skin color, but asks all very highly correlated questions with skin
color. (bad neighborhood)
Is COMPAS biased toward skin color?
Machine Translation
Google translate in 2017
Wrong translated in Turkish, associated the female pronoun with the cook and nurse and the male for
doctor and lawyer, while in Turkish there is no difference.
In 2018, "[...] Google announced that they were taking the first steps to address the prevalence of gender
bias in machine translation after Google Translate was shown to have a gender bias when translating from
gender-neutral Turkish to English.”
This is a bias that can be easily fixed.
But none all of them are:
Gender bias in language
The source of the difficulty: how would we even tell what’s unfair?
→ We don’t really agree on what a bias is, because... we don’t really agree on what fairness is.
Who has access to the “ingredients” and to the “recipe’’?
What are biases\fairness?
Biases in computer systems is not a new idea, back in the 1980s…
“In the 1980s, however, most of the airlines brought before the Antitrust Division of the United States
Justice Department allegations of anticompetitive practices by American and United Airlines whose
,reservation systems—Sabre and Apollo, respectively— dominated the field. It was claimed, among other
things, that the two reservations systems are biased [Schrifin 1985].’’
Biases
just as with moral dilemmas...
“[...] freedom from bias should be counted among the select set of criteria—including reliability,
accuracy, and efficiency—according to which the quality of systems in use in society should be
judged.”
“if one wants to develop criteria for judging the quality of systems in use—which we do—then
criteria must be delineated in ways that speak robustly yet precisely to relevant social matters.”
Fairness
just as with moral dilemmas...
“[...] as the impact of AI increases across sectors and societies, it is critical to work towards systems
that are fair and inclusive for all. This is a hard task. First, ML models learn from existing data
collected from the real world, and so an accurate model may learn or even amplify problematic
pre-existing biases in the data based on race, gender, religion or other characteristics. For example, a
job-matching system might learn to favor male candidates for CEO interviews, or assume female
pronouns when translating words like “nurse” or “babysitter” into Spanish, because that matches
historical data.
Second, even with the most rigorous and cross-functional training and testing, it is a challenge to
ensure that a system will be fair across all situations. For example, a speech recognition system that
was trained on US adults may be fair and inclusive in that context. When used by teenagers, however,
the system may fail to recognize evolving slang words or phrases. If the system is deployed in the
United Kingdom, it may have a harder time with certain regional British accents than others. And
even when the system is applied to US adults, we might discover unexpected segments of the
population whose speech it handles poorly, for example people speaking with a stutter. Use of the
system after launch can reveal unintentional, unfair blind spots that were difficult to predict.”
Fairness
just as with moral dilemmas...
“Third, there is no standard definition of fairness, whether decisions are made by humans or
machines. Identifying appropriate fairness criteria for a system requires accounting for user
experience, cultural, social, historical, political, legal, and ethical considerations, several of which
may have tradeoffs. Is it more fair to give loans at the same rate to two different groups, even if they
have different rates of payback, or is it more fair to give loans proportional to each group’s payback
rates? Is neither of these the most fair approach? At what level of granularity should groups be
defined, and how should the boundaries between groups be decided? When is it fair to define a group
at all versus better factoring on individual differences? Even for situations that seem simple, people
may disagree about what is fair, and it may be unclear what point of view should dictate policy,
especially in a global setting.”
Types of biases?
, - preexisting: “Preexisting bias has its roots in social institutions, practices, and attitudes.”
- technical: “Technical bias arises from technical constraints or considerations.”
- emergent: “Emergent bias arises in a context of use.”
Preexisting bias: Bias caused by training data
Historical biases in the training data will be learned by the algorithm. Past discriminations will be
reflected in the machine decisions.
Preexisting biases: mind the proxy attributes
Race, gender, and age are typically not legitimate (or even legally allowed) features for use in decision
making.
Without using such sensitive features directly, an algorithm might use closely correlated data as a
proxy that stands in for them.
This is what happens with the COMPAS crime forecasting tool.
Biases of people:
•••••••
judges
doctors
teachers (when they evaluate students) students (when they evaluate teachers) loan providers
selection committee members
etc
But are AI better than people?