CSCI 544 QUESTIONS & ANSWERS
Given a fixed word vocabulary, we should expand the vocabulary when one word is not
included in the vocabulary during preprocessing
A) True
B) False - Answers - True
Which of the following procedure is NOT a standard step for classification tasks?
A) Human study
B) Evaluation
C) Preprocessing
D) Feature extraction - Answers - Human Study
Which of the following methods could be used for preprocessing?
A) Contracting and standardizing
B) Removing extra spaces
C) Substituting words with their synonyms
D) Stemming - Answers - Contracting and standardizing
Removing extra spaces
Stemming
Which of the following task is NOT a classification task?
Grouping images into different categories such as tree, flower, etc
Sentiment analysis
Predicting book genres from one of labels, e.g., fiction, narrative, etc
Predicting stock price in the next few days - Answers - Predicting stock price in the next
few days(this is a regression task)
Given the vocabulary [first, second, third, Monday, Tuesday, January, February,
equality, the], what's the corresponding bag-of-words vector representation for text
"Martin Luther King Jr. Day, observed on the third Monday of January, honors the civil
rights leader's legacy of promoting equality, justice, and nonviolent activism."
[0, 0, 1, 1, 0, 1, 0, 1, 1]
[0, 0, 0, 1, 1, 1, 1, 0, 3]
[0, 0, 1, 1, 0, 1, 0, 1, 3]
[1, 1, 1, 0, 0, 0, 0, 0, 0] - Answers - Right answer should actually be: [0, 0, 1, 1, 0, 1, 0,
1, 2]
Output of feature extractors are feature vectors
True
False - Answers - True
Which statement about feature extraction is WRONG?
,Stop words are useless for sentiment analysis
Deep learning automatically learns feature representations for different tasks
Capitalization is a significant feature for sentiment analysis
Negation words are important for sentiment analysis - Answers - Stop Words are
useless for sentiment analysis or capitalization is a significant feature for sentiment
analysis
Considering a classification task where ground-truth labels are [1, 0, 0, 1], suppose we
have a model with predicted labels [1, 1, 1, 0], then precision of predictions is
1/3
1/4
1/2
3/4 - Answers - 1/3(Precision formula is: (TP/(TP + FP))
Which of the following statements about the multi-modal GPT-4o is CORRECT?
GPT-4o cannot generate codes
GPT-4o can understand audio input
GPT-4o can generate stories
GPT-4o can receive video input - Answers - GPT-4o can understand audio input
GPT-4o can generate stories
GPT-4o can receive video input
Which of the following is NOT a natural language?
English
First order logic
C#
C++ - Answers - First order logic
C#
C++
Which of the following is not hyper-parameter when training a NN?
Weight decay
Learning rate
Batch size
Weights of each layer - Answers - Weights of Each Layer
Which of the following statements accurately describes a step in the backpropagation
algorithm for training a neural network?
Forward pass, where inputs are passed through the network to compute the output.
Adjusting weights and biases based on the difference between predicted and actual
outputs.
Initializing network parameters randomly before training.
Calculating the gradient of the loss with respect to the weights and biases - Answers -
Calculating the gradient of the loss with respect to the weights and biases
What are the advantages of CNN?
, Parallel computation
Simple architecture
Good at modeling long dependencies
Small context window - Answers - Parallel Computation and good at modeling long
dependencies
The hyperparameters in feedforward neural networks are learnable parameters.
True
False - Answers - False
What is/are true about word embeddings?
It can be used to compute the similarity of distance between two different words
It is a dense vector per word
We can perform linear operations on top of word embeddings
It contains contextual information from the sentence - Answers - It can be used to
compute the similarity of distance between two different words
It is a dense vector per word
We can perform linear operations on top of word embeddings
In the context of Word2Vec, what is the purpose of sub-sampling?
Sub-sampling is a technique to augment the training dataset with additional samples for
better model generalization.
Sub-sampling refers to adjusting the learning rate during training to control the
convergence speed of the model.
Sub-sampling is used to randomly remove a portion of the training data to reduce model
overfitting.
Sub-sampling involves skipping frequent words during training to improve efficiency and
focus on informative words. - Answers - Sub-sampling involves skipping frequent words
during training to improve efficiency and focus on informative words.
Relu Function: - Answers - max(0,x)
Word2Vec is a Factorization-based word embedding approach.
True
False - Answers - False
Feedforward neural network(FNN) is good at modeling sentences with various lengths.
True
False - Answers - False
What is False about non-convex optimization problems?
There are multiple local optimal points
Initialization will impact the final convergence point
DNNs are non-convex
It's guaranteed to converge to a global optimum - Answers - It is garunteed to converge
to a global optimum
Given a fixed word vocabulary, we should expand the vocabulary when one word is not
included in the vocabulary during preprocessing
A) True
B) False - Answers - True
Which of the following procedure is NOT a standard step for classification tasks?
A) Human study
B) Evaluation
C) Preprocessing
D) Feature extraction - Answers - Human Study
Which of the following methods could be used for preprocessing?
A) Contracting and standardizing
B) Removing extra spaces
C) Substituting words with their synonyms
D) Stemming - Answers - Contracting and standardizing
Removing extra spaces
Stemming
Which of the following task is NOT a classification task?
Grouping images into different categories such as tree, flower, etc
Sentiment analysis
Predicting book genres from one of labels, e.g., fiction, narrative, etc
Predicting stock price in the next few days - Answers - Predicting stock price in the next
few days(this is a regression task)
Given the vocabulary [first, second, third, Monday, Tuesday, January, February,
equality, the], what's the corresponding bag-of-words vector representation for text
"Martin Luther King Jr. Day, observed on the third Monday of January, honors the civil
rights leader's legacy of promoting equality, justice, and nonviolent activism."
[0, 0, 1, 1, 0, 1, 0, 1, 1]
[0, 0, 0, 1, 1, 1, 1, 0, 3]
[0, 0, 1, 1, 0, 1, 0, 1, 3]
[1, 1, 1, 0, 0, 0, 0, 0, 0] - Answers - Right answer should actually be: [0, 0, 1, 1, 0, 1, 0,
1, 2]
Output of feature extractors are feature vectors
True
False - Answers - True
Which statement about feature extraction is WRONG?
,Stop words are useless for sentiment analysis
Deep learning automatically learns feature representations for different tasks
Capitalization is a significant feature for sentiment analysis
Negation words are important for sentiment analysis - Answers - Stop Words are
useless for sentiment analysis or capitalization is a significant feature for sentiment
analysis
Considering a classification task where ground-truth labels are [1, 0, 0, 1], suppose we
have a model with predicted labels [1, 1, 1, 0], then precision of predictions is
1/3
1/4
1/2
3/4 - Answers - 1/3(Precision formula is: (TP/(TP + FP))
Which of the following statements about the multi-modal GPT-4o is CORRECT?
GPT-4o cannot generate codes
GPT-4o can understand audio input
GPT-4o can generate stories
GPT-4o can receive video input - Answers - GPT-4o can understand audio input
GPT-4o can generate stories
GPT-4o can receive video input
Which of the following is NOT a natural language?
English
First order logic
C#
C++ - Answers - First order logic
C#
C++
Which of the following is not hyper-parameter when training a NN?
Weight decay
Learning rate
Batch size
Weights of each layer - Answers - Weights of Each Layer
Which of the following statements accurately describes a step in the backpropagation
algorithm for training a neural network?
Forward pass, where inputs are passed through the network to compute the output.
Adjusting weights and biases based on the difference between predicted and actual
outputs.
Initializing network parameters randomly before training.
Calculating the gradient of the loss with respect to the weights and biases - Answers -
Calculating the gradient of the loss with respect to the weights and biases
What are the advantages of CNN?
, Parallel computation
Simple architecture
Good at modeling long dependencies
Small context window - Answers - Parallel Computation and good at modeling long
dependencies
The hyperparameters in feedforward neural networks are learnable parameters.
True
False - Answers - False
What is/are true about word embeddings?
It can be used to compute the similarity of distance between two different words
It is a dense vector per word
We can perform linear operations on top of word embeddings
It contains contextual information from the sentence - Answers - It can be used to
compute the similarity of distance between two different words
It is a dense vector per word
We can perform linear operations on top of word embeddings
In the context of Word2Vec, what is the purpose of sub-sampling?
Sub-sampling is a technique to augment the training dataset with additional samples for
better model generalization.
Sub-sampling refers to adjusting the learning rate during training to control the
convergence speed of the model.
Sub-sampling is used to randomly remove a portion of the training data to reduce model
overfitting.
Sub-sampling involves skipping frequent words during training to improve efficiency and
focus on informative words. - Answers - Sub-sampling involves skipping frequent words
during training to improve efficiency and focus on informative words.
Relu Function: - Answers - max(0,x)
Word2Vec is a Factorization-based word embedding approach.
True
False - Answers - False
Feedforward neural network(FNN) is good at modeling sentences with various lengths.
True
False - Answers - False
What is False about non-convex optimization problems?
There are multiple local optimal points
Initialization will impact the final convergence point
DNNs are non-convex
It's guaranteed to converge to a global optimum - Answers - It is garunteed to converge
to a global optimum