COMM 159 FINAL STUDY GUIDE
WEEK 5:
Monday
● You need to create a training data set. The model will not do the task properly without any
training.
● Training Dataset
○ You may not know the specific algorithm of the neural network, but if you want to use it,
you have to understand what type of data set you need to train the model.
○ This data set has two components: x (input) and y (label)
■ X is the data you need to provide for every single input neuron. It contains a lot
of information (all the features of the house; if a house has all these features what
is the selling price [y]) (dating website - all the features of a human)
■ Y is the group truth (the selling price of the house) The output you are expecting
from this model. (dating website - how popular you well be)
○ You need to collect a lot of data (data points) and get as much information as possible
(thousands of houses [features and prices]). This will make the neural network more
accurate.
○ The more data you have the better the model performance will be
● This data is incredibly important and more precious than the model.
● Training a Neural Network: training a neural network involves using a training data set to adjust
the weights and biases within the network, ensuring that its outputs closely approximate the
“ground truth” data provided in the dataset. Training the neural network is adjusting the weights
and the biases to get close to the ground truth in the dataset.
○ Prediction and Ground Truth (in the real world what you actually want to predict)
○ At the beginning the prediction is going to be random and will not be accurate. You
compare the model prediction and the ground truth which leads to an error.
● Compare the model prediction and the ground truth. Then, if there is an error, you only update
and change weight (w) and bias (b) to reduce error. That should allow the model to be more
accurate. The ground truth and prediction will eventually be the same.
, ○ For the first time running the model, you put random information
○ Train until the model is 99.99% accurate (you will never get a zero error); minimize the
error of your model to the lowest possible value
○ In most cases, the data are labeled by humans; there has to be someone labeling the data
■ There are a lot of companies whose job is to label data
● There is an activation in the neural network
○ In this not updated model the neural network has the highest activation for “grass” and
that is the prediction, when the ground truth is “flower”
○ In the updated model the highest activation will be for “flower” after the weights and
biases are adjusted
● You are using a training dataset to train a neural network, which essentially updates and changes
the weight and bias.
● How do we do the training? Backpropagation
○ It must start from the output layer because it is where the error was first detected (the
output layer is closest to the error)
○ Layer by layer backwards. Neurons in the same layer can adjust their weights and biases
simultaneously.
● Backpropagation calculation of weight and bias: (28 min)
,● Where does 5 come from: (8*0.5) + (2*0.5) = 5
○ You make the output smaller by making the weight smaller. In this case, you would make
w1 smaller because 8 is making the output too large and w2 should be larger. Keep
adjusting until y=3.
● Error = Prediction (must be a number) - Ground Truth (corresponds to some number)
○ The error should be larger than zero, negative error does not make any sense
● Sometimes data is easy to collect, but the hard part is labeling the ground truth. In most cases the
data are labeled by humans.
● You provide labeling and feedback. You are training the model.
● There are a lot of companies out there whose entire purpose is to label data.
● Ethical issues of collecting data for training AI
○ Consent
■ Consent is not cared about too much in the AI field
■ Traditionally, collecting human data requires strict protocols, and explicit consent
must be obtained from individuals before data collection.
● Failure to detect changing people during a real-world interaction study-
○ They found that people pay very little attention to the person they are actually talking to
in front of them.
○ The person being talked to on the subject is changed by an obstruction, and the subject
does not realize.
○ The study was initially highly criticized because they did collect human data without
asking for consent.
■ They solved this problem by having the participant sign for consent but switched
the people proctoring the consent signing once it was signed.
● Does the field of AI consistently follow the rule of obtaining human consent before collecting
data?
● NIST: defining and supporting standards, and this now includes developing standards for
artificial intelligence. One of the testing infrastructures it maintains is for biometric data.
, ○ They are going to make criteria for AI
○ They released the first facial recognition data set
● After the terrorist attacks of September 11, 2001, NIST became part of the national response to
create biometric standards to verify and track people entering the United States. This was a
turning point for research on facial recognition, which widened from a focus on law enforcement
to controlling people crossing national borders.
○ Because mug shots are taken at the time of arrest, it’s not clear if these people were
charged, acquitted, or imprisoned.
○ NIST has run competitions with these mug shots in which researchers compete to see
whose algorithm is the fastest and most accurate
○ The winners celebrate these victories; they can bring fame, job offers, and industry-wide
recognition.
○ The NIST databases foreshadow the emergence of a logic that has now thoroughly
pervaded the tech sector: the unswerving belief that everything is data and is there for the
taking.
● It doesn’t matter where a photograph was taken; whether it reflects a moment of vulnerability or
pain or represents a form of shaming the subject doesn't matter. It has become so normalized
across the industry to take and use whatever is available that few stop to question the underlying
politics.
○ Machine learning systems are trained on images like those in the NIST every
day—images that were taken from the internet or from state institutions without context
and without consent.
● Face Recognition Technology as an example of data collection with consent: The first set of
photographs, taken between 1993 and 1994 by army research at George Mason University. Each
subject was briefed about the project and signed a release form that had been approved by the
university’s ethics review board. Subjects knew what they were participating in and gave full
consent.
● Machine learning systems are trained on images like those in the NIST every day—images that
were taken from the internet or from state institutions without context and with consent.
● ANYTHING THAT CAN BE CAPTURED IS DATA
● Hurtful AI:
○ CU Colorado Springs students were secretly photographed for government-backed
facial-recognition research: A professor installed a camera on the main walkway of the
campus and secretly captured photos of more than seventeen hundred students and
faculty—all to train a facial recognition system of his own.
○ He tried to justify: “One of the important questions we asked was how well the
algorithms could handle non-cooperating subjects.”
○ Boult said. According to Boult, people’s faces change slightly if they know they are being
photographed. Making the subjects aware of the study by getting permission would have
defeated the purpose. He also touched on being able to recognize car bombers, vest
bombers and protecting women/men in the fighting forces. It's always for national
security reasons.
○ He also talked about how some people making comments are posting next to their profile
pictures on Facebook.
WEEK 5:
Monday
● You need to create a training data set. The model will not do the task properly without any
training.
● Training Dataset
○ You may not know the specific algorithm of the neural network, but if you want to use it,
you have to understand what type of data set you need to train the model.
○ This data set has two components: x (input) and y (label)
■ X is the data you need to provide for every single input neuron. It contains a lot
of information (all the features of the house; if a house has all these features what
is the selling price [y]) (dating website - all the features of a human)
■ Y is the group truth (the selling price of the house) The output you are expecting
from this model. (dating website - how popular you well be)
○ You need to collect a lot of data (data points) and get as much information as possible
(thousands of houses [features and prices]). This will make the neural network more
accurate.
○ The more data you have the better the model performance will be
● This data is incredibly important and more precious than the model.
● Training a Neural Network: training a neural network involves using a training data set to adjust
the weights and biases within the network, ensuring that its outputs closely approximate the
“ground truth” data provided in the dataset. Training the neural network is adjusting the weights
and the biases to get close to the ground truth in the dataset.
○ Prediction and Ground Truth (in the real world what you actually want to predict)
○ At the beginning the prediction is going to be random and will not be accurate. You
compare the model prediction and the ground truth which leads to an error.
● Compare the model prediction and the ground truth. Then, if there is an error, you only update
and change weight (w) and bias (b) to reduce error. That should allow the model to be more
accurate. The ground truth and prediction will eventually be the same.
, ○ For the first time running the model, you put random information
○ Train until the model is 99.99% accurate (you will never get a zero error); minimize the
error of your model to the lowest possible value
○ In most cases, the data are labeled by humans; there has to be someone labeling the data
■ There are a lot of companies whose job is to label data
● There is an activation in the neural network
○ In this not updated model the neural network has the highest activation for “grass” and
that is the prediction, when the ground truth is “flower”
○ In the updated model the highest activation will be for “flower” after the weights and
biases are adjusted
● You are using a training dataset to train a neural network, which essentially updates and changes
the weight and bias.
● How do we do the training? Backpropagation
○ It must start from the output layer because it is where the error was first detected (the
output layer is closest to the error)
○ Layer by layer backwards. Neurons in the same layer can adjust their weights and biases
simultaneously.
● Backpropagation calculation of weight and bias: (28 min)
,● Where does 5 come from: (8*0.5) + (2*0.5) = 5
○ You make the output smaller by making the weight smaller. In this case, you would make
w1 smaller because 8 is making the output too large and w2 should be larger. Keep
adjusting until y=3.
● Error = Prediction (must be a number) - Ground Truth (corresponds to some number)
○ The error should be larger than zero, negative error does not make any sense
● Sometimes data is easy to collect, but the hard part is labeling the ground truth. In most cases the
data are labeled by humans.
● You provide labeling and feedback. You are training the model.
● There are a lot of companies out there whose entire purpose is to label data.
● Ethical issues of collecting data for training AI
○ Consent
■ Consent is not cared about too much in the AI field
■ Traditionally, collecting human data requires strict protocols, and explicit consent
must be obtained from individuals before data collection.
● Failure to detect changing people during a real-world interaction study-
○ They found that people pay very little attention to the person they are actually talking to
in front of them.
○ The person being talked to on the subject is changed by an obstruction, and the subject
does not realize.
○ The study was initially highly criticized because they did collect human data without
asking for consent.
■ They solved this problem by having the participant sign for consent but switched
the people proctoring the consent signing once it was signed.
● Does the field of AI consistently follow the rule of obtaining human consent before collecting
data?
● NIST: defining and supporting standards, and this now includes developing standards for
artificial intelligence. One of the testing infrastructures it maintains is for biometric data.
, ○ They are going to make criteria for AI
○ They released the first facial recognition data set
● After the terrorist attacks of September 11, 2001, NIST became part of the national response to
create biometric standards to verify and track people entering the United States. This was a
turning point for research on facial recognition, which widened from a focus on law enforcement
to controlling people crossing national borders.
○ Because mug shots are taken at the time of arrest, it’s not clear if these people were
charged, acquitted, or imprisoned.
○ NIST has run competitions with these mug shots in which researchers compete to see
whose algorithm is the fastest and most accurate
○ The winners celebrate these victories; they can bring fame, job offers, and industry-wide
recognition.
○ The NIST databases foreshadow the emergence of a logic that has now thoroughly
pervaded the tech sector: the unswerving belief that everything is data and is there for the
taking.
● It doesn’t matter where a photograph was taken; whether it reflects a moment of vulnerability or
pain or represents a form of shaming the subject doesn't matter. It has become so normalized
across the industry to take and use whatever is available that few stop to question the underlying
politics.
○ Machine learning systems are trained on images like those in the NIST every
day—images that were taken from the internet or from state institutions without context
and without consent.
● Face Recognition Technology as an example of data collection with consent: The first set of
photographs, taken between 1993 and 1994 by army research at George Mason University. Each
subject was briefed about the project and signed a release form that had been approved by the
university’s ethics review board. Subjects knew what they were participating in and gave full
consent.
● Machine learning systems are trained on images like those in the NIST every day—images that
were taken from the internet or from state institutions without context and with consent.
● ANYTHING THAT CAN BE CAPTURED IS DATA
● Hurtful AI:
○ CU Colorado Springs students were secretly photographed for government-backed
facial-recognition research: A professor installed a camera on the main walkway of the
campus and secretly captured photos of more than seventeen hundred students and
faculty—all to train a facial recognition system of his own.
○ He tried to justify: “One of the important questions we asked was how well the
algorithms could handle non-cooperating subjects.”
○ Boult said. According to Boult, people’s faces change slightly if they know they are being
photographed. Making the subjects aware of the study by getting permission would have
defeated the purpose. He also touched on being able to recognize car bombers, vest
bombers and protecting women/men in the fighting forces. It's always for national
security reasons.
○ He also talked about how some people making comments are posting next to their profile
pictures on Facebook.