DATA C104 - Midterm Notes
Abstraction - ANS-- Data can be made from people -- but exactly the work of making the
data abstracts from them
- Abstraction serves purposes beyond just generalization -- looking aside from personal
suffering, turning people into fungible units or commodities
- How do we ethically acknowledge -- and, where appropriate, counteract --the abstraction
that's intrinsically part of representation in data?
\Algorithmic Inequality / Machine Bias - ANS-Algorithms that reproduce and reinforce
structural inequalities, marginalizations, and privileges that are already present and known in
society
\Case Study: Amazon-Whole Foods Deal (Lecture 1.6 & Reading Notes #3) - ANS-- Data as
a capital worth investing
- Instead of the idea that Amazon did this deal for the purpose of getting some control in the
retail world, it was all about extracting data because it allows Amazon to analyze and
understand their customer's offline shopping habits, which is then converted into products
and personalized advertisements
- It then goes into the idea of smart stores that can truly track everything about the individual,
which the data can then be used to for advertisements and predictions on personal spending
\Case Study: ASA & ACM Code of Ethics (Lecture 3.1) - ANS-ASA (American Statistical
Association) code
- Professional best practices for statistical studies
- Doing scientifically good statistics in complex social contexts
- Responsibilities to stakeholders, fellow statistical practitioners
ACM (Association for Computing Machinery) code
- General ethical principles, professional responsibilities, professional leadership principles,
compliance with the code
- Design and implement systems that are robustly and usably secure
\Case Study: Bay Area Air Quality Management District (Lecture 1.3) - ANS-- AQI (Air
Quality Index): a useful and obligatory abstraction for standardization, communication, and
action
- Flare Reports
- Informative, transparent, contains database
- Representation ~ work on data (world-making) to explain the abstractions and phenomena
of air quality, pollution, flares, etc (knowledge) which also cause changes to the world
(power)
\Case Study: Cell Phones with Metadata - Carpenter vs US 2018 (Lecture 2.4) - ANS-- Cell
site location information (CSLI) triangulates the location of every cell phone (collected
automatically, and by third parties)
- Contains metadata information (non-content) such as CSLI
- CSLI used to identify and convict Timothy Carpenter in armed robberies of T-Mobile and
RadioShack stores based on historical records showing his cell phone nearby. Appealed to
the US Supreme Court.
, - 4th Amendment protects against unreasonable searches and seizures, but CSLI is
metadata and is given up voluntarily to the 3rd-party cell provider. Is it private?
- Metadata can tell us a lot and inferences of it can reveal many things about the individual,
and control over personal information is hard when huge amounts of metadata get collected
automatically
\Case Study: COMPAS (Lecture 1.4 & Reading Notes #2) - ANS-- Calculate recidivism score
(how likely it is that a defendant will be arrested and charged with a crime again in the future)
to help judges make decisions (algorithms/risk assessment tools in courtrooms)
- ProPublica claims that it is biased against African Americans (higher risk, but didn't
re-offend), compared to White people (lower risk, but did re-offend)
- Questions of fairness (sociotechnical concept) regarding algorithmic inequality, impossible
to satisfy both ProPublica and Northpointe's definition of fairness
- "Garbage in, Garbage out": input biased data, outputs biased data as well
\Case Study: Coordinated Entry System in LA County (Lecture 2.2) - ANS-- Collects
information on unhoused individuals
- A digital registry stores data; a algorithm processes it to determine a VI-SPDAT score from
1-17 (score of 8+ qualifies a person to be assessed for permanent supportive housing)
- Another algorithm matches people who qualify with housing opportunities
- Survey questions (use of emergency medical, mental health crisis, suicide prevention
services, sex work, drug use, self-harm, photograph)
- Protected personal information (social security, name, DOB, demographic, veteran,
immigration, residency)
- Institution's view from above: a rational system, making a data-based case for services,
good intentions all around - solve a social problem with a data-centered technical system
- Concerns? People may not be told their VI-SPDAT scores and it could be wrong, can affect
people's long-term options, give up sensitive data but may get nothing in return
\Case Study: Copernican Revolution (Lecture 1.7 & Reading Notes #3) - ANS--
Accumulation of 'normal' science knowledge and theories leads to a critical point in our
human history: scientific revolution
- Having truly transformed the scientific imagination in ways that transformed the world
- Thomas Kuhn's paradigm shifts that have brought a new image to the history of science
and the relationship between science and society
\Case Study: Crisis Text Line 2022 (Lecture 2.4) - ANS-- Crisis Text Line stops sharing
conversation data with AI company, Loris.AI after facing scrutiny from data privacy experts
- Monetization in ML of sensitive personal data, even though Crisis Text Line is a
not-for-profit service
\Case Study: Diana's Onlife World (Lecture 1.8 & Reading Notes #4) - ANS--Rise of smart
preemptive technological infrastructures that will heavily impact modern society by regulating
our behaviors in certain ways
- Narrative story about Diana and PDA's (personal digital assistant) - a type of AI that is able
to do just about anything based on a person's preferences, and primarily regulates a
person's behavior by always being convenient and being one step ahead
- Personal data can even be at stake depending on how the technology is designed, which
may inevitably end up as the scenario in the onlife world with Diana.
\Case Study: Facebook "Emotional Contagion" Experiment (Lecture 3.2) - ANS-Over one
week in 2012, researchers from Facebook and Cornell University changed the content of
news feeds for a random sample of Facebook users (over 600,000)
- One group of users: removed content containing positive words
Abstraction - ANS-- Data can be made from people -- but exactly the work of making the
data abstracts from them
- Abstraction serves purposes beyond just generalization -- looking aside from personal
suffering, turning people into fungible units or commodities
- How do we ethically acknowledge -- and, where appropriate, counteract --the abstraction
that's intrinsically part of representation in data?
\Algorithmic Inequality / Machine Bias - ANS-Algorithms that reproduce and reinforce
structural inequalities, marginalizations, and privileges that are already present and known in
society
\Case Study: Amazon-Whole Foods Deal (Lecture 1.6 & Reading Notes #3) - ANS-- Data as
a capital worth investing
- Instead of the idea that Amazon did this deal for the purpose of getting some control in the
retail world, it was all about extracting data because it allows Amazon to analyze and
understand their customer's offline shopping habits, which is then converted into products
and personalized advertisements
- It then goes into the idea of smart stores that can truly track everything about the individual,
which the data can then be used to for advertisements and predictions on personal spending
\Case Study: ASA & ACM Code of Ethics (Lecture 3.1) - ANS-ASA (American Statistical
Association) code
- Professional best practices for statistical studies
- Doing scientifically good statistics in complex social contexts
- Responsibilities to stakeholders, fellow statistical practitioners
ACM (Association for Computing Machinery) code
- General ethical principles, professional responsibilities, professional leadership principles,
compliance with the code
- Design and implement systems that are robustly and usably secure
\Case Study: Bay Area Air Quality Management District (Lecture 1.3) - ANS-- AQI (Air
Quality Index): a useful and obligatory abstraction for standardization, communication, and
action
- Flare Reports
- Informative, transparent, contains database
- Representation ~ work on data (world-making) to explain the abstractions and phenomena
of air quality, pollution, flares, etc (knowledge) which also cause changes to the world
(power)
\Case Study: Cell Phones with Metadata - Carpenter vs US 2018 (Lecture 2.4) - ANS-- Cell
site location information (CSLI) triangulates the location of every cell phone (collected
automatically, and by third parties)
- Contains metadata information (non-content) such as CSLI
- CSLI used to identify and convict Timothy Carpenter in armed robberies of T-Mobile and
RadioShack stores based on historical records showing his cell phone nearby. Appealed to
the US Supreme Court.
, - 4th Amendment protects against unreasonable searches and seizures, but CSLI is
metadata and is given up voluntarily to the 3rd-party cell provider. Is it private?
- Metadata can tell us a lot and inferences of it can reveal many things about the individual,
and control over personal information is hard when huge amounts of metadata get collected
automatically
\Case Study: COMPAS (Lecture 1.4 & Reading Notes #2) - ANS-- Calculate recidivism score
(how likely it is that a defendant will be arrested and charged with a crime again in the future)
to help judges make decisions (algorithms/risk assessment tools in courtrooms)
- ProPublica claims that it is biased against African Americans (higher risk, but didn't
re-offend), compared to White people (lower risk, but did re-offend)
- Questions of fairness (sociotechnical concept) regarding algorithmic inequality, impossible
to satisfy both ProPublica and Northpointe's definition of fairness
- "Garbage in, Garbage out": input biased data, outputs biased data as well
\Case Study: Coordinated Entry System in LA County (Lecture 2.2) - ANS-- Collects
information on unhoused individuals
- A digital registry stores data; a algorithm processes it to determine a VI-SPDAT score from
1-17 (score of 8+ qualifies a person to be assessed for permanent supportive housing)
- Another algorithm matches people who qualify with housing opportunities
- Survey questions (use of emergency medical, mental health crisis, suicide prevention
services, sex work, drug use, self-harm, photograph)
- Protected personal information (social security, name, DOB, demographic, veteran,
immigration, residency)
- Institution's view from above: a rational system, making a data-based case for services,
good intentions all around - solve a social problem with a data-centered technical system
- Concerns? People may not be told their VI-SPDAT scores and it could be wrong, can affect
people's long-term options, give up sensitive data but may get nothing in return
\Case Study: Copernican Revolution (Lecture 1.7 & Reading Notes #3) - ANS--
Accumulation of 'normal' science knowledge and theories leads to a critical point in our
human history: scientific revolution
- Having truly transformed the scientific imagination in ways that transformed the world
- Thomas Kuhn's paradigm shifts that have brought a new image to the history of science
and the relationship between science and society
\Case Study: Crisis Text Line 2022 (Lecture 2.4) - ANS-- Crisis Text Line stops sharing
conversation data with AI company, Loris.AI after facing scrutiny from data privacy experts
- Monetization in ML of sensitive personal data, even though Crisis Text Line is a
not-for-profit service
\Case Study: Diana's Onlife World (Lecture 1.8 & Reading Notes #4) - ANS--Rise of smart
preemptive technological infrastructures that will heavily impact modern society by regulating
our behaviors in certain ways
- Narrative story about Diana and PDA's (personal digital assistant) - a type of AI that is able
to do just about anything based on a person's preferences, and primarily regulates a
person's behavior by always being convenient and being one step ahead
- Personal data can even be at stake depending on how the technology is designed, which
may inevitably end up as the scenario in the onlife world with Diana.
\Case Study: Facebook "Emotional Contagion" Experiment (Lecture 3.2) - ANS-Over one
week in 2012, researchers from Facebook and Cornell University changed the content of
news feeds for a random sample of Facebook users (over 600,000)
- One group of users: removed content containing positive words