Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

CS 7641 - MIDTERM EXAM QUESTIONS ANSWERED CORRECTLY LATEST UPDATE 2026

Rating
-
Sold
-
Pages
9
Grade
A+
Uploaded on
17-04-2026
Written in
2025/2026

CS 7641 - MIDTERM EXAM QUESTIONS ANSWERED CORRECTLY LATEST UPDATE 2026 In general, when choosing a hypothesis space, is a very large hypothesis space preferable to a smaller one? - Answers False Can a network of perceptrons with linear activation functions be simplified into a single unit perceptron computing the same function? - Answers True Should the nearest neighbor method be used over a decision tree learning method for a learning problem with over 1000 attributes, only a few of which are probably relevant? - Answers False Is your target concept an element of your hypothesis space? - Answers True Does the Boosting algorithm have the advantage of not overfitting? - Answers False What are potential issues with very deep decision trees? - Overfitting to training data - Being insensitive to feature scaling - Underfitting due to simplicity - Reduced interpretability - Long computation times during predictions - Always providing the best accuracy - Answers Correct: - Reduced interpretability - Long computation times during predictions - Being insensitive to feature scaling - Overfitting to training data Incorrect: - Underfitting due to simplicity - Always providing the best accuracy When deciding on a split for a continuous variable in decision trees, what is true? - The split always divides data into equal parts - A threshold is determined for splitting instances into two groups - The variable is always discretized into categories - The split relies on a fixed global threshold for all nodes - The split aims to increase the homogeneity of child nodes - The data is often sorted by that variable's values - Answers Correct: - A threshold is determined for splitting instances into two groups - The split aims to increase the homogeneity of child nodes - The data is often sorted by that variable's values Incorrect: - The split always divides data into equal parts - The split relies on a fixed global threshold for all nodes - The variable is always discretized into categories Why might pruning be applied to a decision tree? - To ensure the tree is balanced - To remove branches that provide little to no predictive power - To simplify the model and improve interpretability - To always achieve the best accuracy - To increase tree depth - To reduce overfitting - Answers Correct: - To remove branches that provide little to no predictive power - To simplify the model and improve interpretability - To reduce overfitting Incorrect: - To increase tree depth - To always achieve the best accuracy - To ensure the tree is balanced Which of the following are ensemble methods used in machine learning for improving model accuracy and robustness? - Random Forest - Simple Linear Regression - Gradient Boosting Machines (GBM) - Logistic Regression - K-Means Clustering - Support Vector Machines (SVMs) - Answers Correct: - Random Forest - Gradient Boosting Machines (GBM) Incorrect: - Simple Linear Regression - K-Means Clustering - Logistic Regression - Support Vector Machines (SVMs) Which algorithms are primarily used for classification tasks? - Polynomial Regression - Decision Trees - LASSO Regression - Ridge Regression - Linear Regression - Support Vector Machines (SVM) - Answers Correct: - Decision Trees - Support Vector Machines (SVM) Incorrect: - Polynomial Regression - LASSO Regression - Ridge Regression - Linear Regression Which of the following algorithms can be used for classification tasks? - K-Nearest Neighbors (KNN) - Neural Networks - Support Vector Machines (SVM) - Decision Trees - Naive Bayes Classifier - Linear Regression - Answers Correct: - K-Nearest Neighbors (KNN) - Neural Networks - Support Vector Machines (SVM) - Decision Trees Incorrect: - Naive Bayes Classifier - Linear Regression What activation functions can be found in neural networks? - Sigmoid - Laplacian - Hyperbolic Tangent (tanh) - Softmax - Gaussian - ReLU (Rectified Linear Unit) - Answers Correct: - Sigmoid - Hyperbolic Tangent (tanh) - Softmax - ReLU (Rectified Linear Unit) Incorrect: - Laplacian - Gaussian In the context of neural networks, which of the following can help in preventing overfitting? - Increasing learning rate - Data augmentation - Adding dropout layers - Use k-means clustering - One-hot encoding - Regularization (e.g., L2 regularization) - Answers Correct: - Data augmentation - Adding dropout layers - Regularization (e.g., L2 regularization) Incorrect: - Increasing learning rate - Use k-means clustering - One-hot encoding What techniques can be used to improve the generalization of neural networks? - Early stopping - Decision tree boosting - Feature scaling using SVM - One-hot encoding of features - Dropout - Decision tree pruning - Answers Correct: - Early stopping - Dropout Incorrect: - Decision tree boosting - Feature scaling using SVM - One-hot encoding of features - Decision tree pruning In the context of instance-based learning, which of the following could be challenges? - Gradient descent optimization - Storage requirements due to retaining all training instances - Sensitivity to noisy data - Deciding the depth of decision trees - Suffering from the curse of dimensionality - Selecting the right activation function - Answers Correct: - Storage requirements due to retaining all training instances - Sensitivity to noisy data - Suffering from the curse of dimensionality Incorrect: - Gradient descent optimization - Deciding the depth of decision trees - Selecting the right activation function When using KNN as an instance-based learning algorithm, what are important considerations? - Learning rate of the algorithm - Depth of the decision tree used - Handling of ties when multiple classes have the same vote count - The way distances between instances are calculated - The architecture of the underlying neural network - Choice of k (number of neighbors) - Answers Correct: - Handling of ties when multiple classes have the same vote count - The way distances between instances are calculated - Choice of k (number of neighbors) Incorrect: - Learning rate of the algorithm - Depth of the decision tree used - The architecture of the underlying neural network How does the performance of instance-based learners like KNNs typically change as we add more training data? - The algorithm's training phase becomes much slower. - The model starts to discard older data automatically. - The algorithm becomes less sensitive to the choice of k. - Query time (prediction time) generally increases. - The model can better generalize to new data with increased training instances. - The distance metric becomes less relevant. - Answers Correct: - Query time (prediction time) generally increases - The model can better generalize to new data with increased training instances Incorrect: - The algorithm's training phase becomes much slower - The model starts to discard older data automatically - The algorithm becomes less sensitive to the choice of k - The distance metric becomes less relevant In the context of gradient boosting, which of the following statements are accurate? - It relies heavily on data normalization before training - It leverages the gradient of the loss function to guide the ensemble process - It always outperforms other ensemble methods regardless of the dataset - It builds trees sequentially, with each new tree trying to correct errors made by the previous one - It constructs each tree using a completely random subset of data - It is an ensemble method exclusively for neural networks - Answers Correct: - It leverages the gradient of the loss function to guide the ensemble process - It builds trees sequentially, with each new tree trying to correct errors made by the previous one Incorrect: - It relies heavily on data normalization before training - It always outperforms other ensemble methods regardless of the dataset - It constructs each tree using a completely random subset of data - It is an ensemble method exclusively for neural networks In the context of Random Forests, which are a popular ensemble method, which of the following statements are true? - They always use a boosting approach - They use bootstrapped samples of the data for each tree - They are an ensemble of decision trees - They require distance metrics for making predictions - Each tree is trained on all features of the dataset - They introduce randomness in feature selection for nodes to create diversity among trees - Answers Correct: - They use bootstrapped samples of the data for each tree - They are an ensemble of decision trees - They introduce randomness in feature selection for nodes to create diversity among trees Incorrect: - They always use a boosting approach - They require distance metrics for making predictions - Each tree is trained on all features of the dataset For boosting algorithms like AdaBoost, which of the following are characteristic features? - They operate primarily on the principle of diversity through data subsetting - They combine weak learners sequentially to form a strong learner - They adjust the weights of misclassified instances to focus on them in subsequent models - They require normalization of data before training - They always use decision trees with a depth greater than 10 - They involve random feature selection for each learner - Answers Correct: - They combine weak learners sequentially to form a strong learner - They adjust the weights of misclassified instances to focus on them in subsequent models Incorrect: - They operate primarily on the principle of diversity through data subsetting - They require normalization of data before training - They always use decision trees with a depth greater than 10They involve random feature selection for each learner What are the primary reasons for using kernel methods in SVMs? - To decrease the number of support vectors in the model - To map input data into a higher-dimensional space - To find a hyperplane that maximizes the margin between classes in the transformed space - To reduce the computational complexity of the SVM algorithm - To tackle non-linearly separable data using SVMs - To handle missing values in the data - Answers Correct: - To map input data into a higher-dimensional space - To find a hyperplane that maximizes the margin between classes in the transformed space - To tackle non-linearly separable data using SVMs Incorrect: - To decrease the number of support vectors in the model - To reduce the computational complexity of the SVM algorithm - To handle missing values in the data In the context of SVMs, which statements are true about the decision boundary? - It always passes through the origin of the data space - It's determined by maximizing the margin between the closest data points of the two classes - The data points that lie on the edge of the margin are called support vectors - Increasing the dimensionality of the data always leads to a better decision boundary - The decision boundary is always linear, irrespective of the kernel used - A large margin always indicates a model with high bias and low variance - Answers Correct: - It's determined by maximizing the margin between the closest data points of the two classes - The data points that lie on the edge of the margin are called support vectors Incorrect: - It always passes through the origin of the data space - Increasing the dimensionality of the data always leads to a better decision boundary - The decision boundary is always linear, irrespective of the kernel used - A large margin always indicates a model with high bias and low variance What are considerations or challenges when working with SVMs and kernel methods? - SVMs can become computationally expensive with large datasets - Choosing the appropriate kernel and its parameters can significantly influence model performance - SVMs are ideally suited for multi-class classification out of the box - Regularization is crucial to prevent overfitting, especially with more flexible kernels - SVMs inherently handle missing data without any preprocessing - Kernel methods always make the model more interpretable - Answers Correct: - SVMs can become computationally expensive with large datasets - Choosing the appropriate kernel and its parameters can significantly influence model performance - Regularization is crucial to prevent overfitting, especially with more flexible kernels Incorrect: - SVMs are ideally suited for multi-class classification out of the box - SVMs inherently handle missing data without any preprocessing - Kernel methods always make the model more interpretable Regarding the bias-variance trade-off in the context of computational learning theory, which of the following are true? - Bias refers to the error introduced by approximating a real-world problem by a too-simple model - Overfitting can be a result of too low bias and too high variance - A model with high bias always has a low VC dimension - Variance refers to the error introduced by a model's sensitivity to small fluctuations in the training set - Bias and variance are independent, and changing one does not affect the other - High variance is always desirable as it ensures the model adapts well to the training data - Answers Correct: - Bias refers to the error introduced by approximating a real-world problem by a too-simple model - Overfitting can be a result of too low bias and too high variance - Variance refers to the error introduced by a model's sensitivity to small fluctuations in the training set Incorrect: - A model with high bias always has a low VC dimension - Bias and variance are independent, and changing one does not affect the other - High variance is always desirable as it ensures the model adapts well to the training data Which of the following are essential components of the PAC (Probably Approximately Correct) learning framework? - A fixed set of features to represent all possible inputs - A hypothesis space from which hypotheses are drawn - A confidence parameter representing the probability that a hypothesis will perform worse than the error measure - A specific learning algorithm, such as a neural network or SVM - A sample complexity determining the number of examples required to achieve a certain error and confidence level - An error measure representing the probability that a hypothesis will misclassify a randomly drawn instance - Answers Correct: - A hypothesis space from which hypotheses are drawn - A confidence parameter representing the probability that a hypothesis will perform worse than the error measure - A sample complexity determining the number of examples required to achieve a certain error and confidence level - An error measure representing the probability that a hypothesis will misclassify a randomly drawn instance Incorrect: - A fixed set of features to represent all possible inputs - A specific learning algorithm, such as a neural network or SVM Concerning the No Free Lunch Theorem in computational learning theory, which statements are correct? - It underscores the importance of empirical evaluations and domain knowledge in selecting appropriate algorithms - It's a proof that unsupervised learning algorithms always outperform supervised ones - It states that no learning algorithm is universally better than all other learning algorithms across all possible problems - It suggests that ensemble methods are always the best choice for any problem - It implies that algorithm performance is problem-dependent - The theorem indicates that neural networks are the most versatile learners - Answers Correct: - It underscores the importance of empirical evaluations and domain knowledge in selecting appropriate algorithms - It states that no learning algorithm is universally better than all other learning algorithms across all possible problems - It implies that algorithm performance is problem-dependent Incorrect: - It's a proof that unsupervised learning algorithms always outperform supervised ones - It suggests that ensemble methods are always the best choice for any problem - The theorem indicates that neural networks are the most versatile learners Concerning the relationship between the Vapnik-Chervonenkis (VC) dimension and sample complexity in PAC learning, which of the following are true? - VC dimension has no influence on the PAC learnability of a hypothesis class - Both VC dimension and sample complexity play roles in determining the number of examples needed for learning to meet desired guarantees - VC dimension and sample complexity are inversely proportional; as one increases, the other decreases - Higher VC dimensions generally require higher sample complexities to achieve the same learning guarantees - A hypothesis class is PAC learnable if and only if its -VC dimension is 1 - Answers Correct: - Both VC dimension and sample complexity play roles in determining the number of examples needed for learning to meet desired guarantees - Sample complexity refers to the number of examples required to achieve certain error and confidence levels - Higher VC dimensions generally require higher sample complexities to achieve the same learning guarantees Incorrect: - VC dimension has no influence on the PAC learnability of a hypothesis class - VC dimension and sample complexity are inversely proportional; as one increases, the other decreases - A hypothesis class is PAC learnable if and only if its VC dimension is 1 In computational learning theory, the Vapnik-Chervonenkis (VC) dimension is a critical concept. Which of the following statements about VC dimension are accurate? - Lower VC dimension always guarantees better model performance on new data - It is always equal to the number of features in the dataset - It measures the capacity or complexity of a hypothesis class - A high VC dimension can be an indicator of a model's potential to overfit

Show more Read less
Institution
CS 7641
Course
CS 7641

Content preview

CS 7641 - MIDTERM EXAM QUESTIONS ANSWERED CORRECTLY LATEST UPDATE 2026

In general, when choosing a hypothesis space, is a very large hypothesis space preferable to a smaller
one? - Answers False
Can a network of perceptrons with linear activation functions be simplified into a single unit
perceptron computing the same function? - Answers True
Should the nearest neighbor method be used over a decision tree learning method for a learning
problem with over 1000 attributes, only a few of which are probably relevant? - Answers False
Is your target concept an element of your hypothesis space? - Answers True
Does the Boosting algorithm have the advantage of not overfitting? - Answers False
What are potential issues with very deep decision trees?

- Overfitting to training data
- Being insensitive to feature scaling
- Underfitting due to simplicity
- Reduced interpretability
- Long computation times during predictions
- Always providing the best accuracy - Answers Correct:
- Reduced interpretability
- Long computation times during predictions
- Being insensitive to feature scaling
- Overfitting to training data
Incorrect:
- Underfitting due to simplicity
- Always providing the best accuracy
When deciding on a split for a continuous variable in decision trees, what is true?

- The split always divides data into equal parts
- A threshold is determined for splitting instances into two groups
- The variable is always discretized into categories
- The split relies on a fixed global threshold for all nodes
- The split aims to increase the homogeneity of child nodes
- The data is often sorted by that variable's values - Answers Correct:
- A threshold is determined for splitting instances into two groups
- The split aims to increase the homogeneity of child nodes
- The data is often sorted by that variable's values
Incorrect:
- The split always divides data into equal parts
- The split relies on a fixed global threshold for all nodes
- The variable is always discretized into categories
Why might pruning be applied to a decision tree?

- To ensure the tree is balanced
- To remove branches that provide little to no predictive power
- To simplify the model and improve interpretability
- To always achieve the best accuracy
- To increase tree depth
- To reduce overfitting - Answers Correct:
- To remove branches that provide little to no predictive power
- To simplify the model and improve interpretability
- To reduce overfitting
Incorrect:
- To increase tree depth
- To always achieve the best accuracy
- To ensure the tree is balanced
Which of the following are ensemble methods used in machine learning for improving model accuracy
and robustness?

, - Random Forest
- Simple Linear Regression
- Gradient Boosting Machines (GBM)
- Logistic Regression
- K-Means Clustering
- Support Vector Machines (SVMs) - Answers Correct:
- Random Forest
- Gradient Boosting Machines (GBM)
Incorrect:
- Simple Linear Regression
- K-Means Clustering
- Logistic Regression
- Support Vector Machines (SVMs)
Which algorithms are primarily used for classification tasks?

- Polynomial Regression
- Decision Trees
- LASSO Regression
- Ridge Regression
- Linear Regression
- Support Vector Machines (SVM) - Answers Correct:
- Decision Trees
- Support Vector Machines (SVM)
Incorrect:
- Polynomial Regression
- LASSO Regression
- Ridge Regression
- Linear Regression
Which of the following algorithms can be used for classification tasks?

- K-Nearest Neighbors (KNN)
- Neural Networks
- Support Vector Machines (SVM)
- Decision Trees
- Naive Bayes Classifier
- Linear Regression - Answers Correct:
- K-Nearest Neighbors (KNN)
- Neural Networks
- Support Vector Machines (SVM)
- Decision Trees
Incorrect:
- Naive Bayes Classifier
- Linear Regression
What activation functions can be found in neural networks?

- Sigmoid
- Laplacian
- Hyperbolic Tangent (tanh)
- Softmax
- Gaussian
- ReLU (Rectified Linear Unit) - Answers Correct:
- Sigmoid
- Hyperbolic Tangent (tanh)
- Softmax
- ReLU (Rectified Linear Unit)
Incorrect:

Written for

Institution
CS 7641
Course
CS 7641

Document information

Uploaded on
April 17, 2026
Number of pages
9
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$11.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
TutorJosh Chamberlain College Of Nursing
Follow You need to be logged in order to follow users or courses
Sold
440
Member since
1 year
Number of followers
16
Documents
31720
Last sold
1 day ago
Tutor Joshua

Here You will find all Documents and Package Deals Offered By Tutor Joshua.

3.5

73 reviews

5
26
4
16
3
14
2
1
1
16

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions