Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

Data Mining Exam 2 Study Guide

Rating
-
Sold
-
Pages
8
Grade
A+
Uploaded on
02-07-2024
Written in
2023/2024

Data Mining Exam 2 Study Guide

Institution
Course

Content preview

Data Mining Exam 2 Study Guide
X - correct answer-attribute, predictor, independent variable, input

y - correct answer-class, response, dependent variable, output

Classification - correct answer-predicts categorical labels

Prediction - correct answer-predicts continuous values

Decision Tree - correct answer-a non-parametric supervised learning
algorithm, which is utilized for both classification and regression tasks. It has a
hierarchical, tree structure, which consists of a root node, branches, internal
nodes and leaf nodes.

K-Nearest Neighbors - correct answer-A data mining method that predicts
(classifies or estimates) an observation i's outcome value based on the k
observations most similar to observation i with respect to the input variables.

Naive Bayes Classifier - correct answer-an algorithm that predicts the
probability of a certain outcome based on prior occurrences of related events

Support Vector Machine - correct answer-Supervised learning classification
tool that seeks a dividing hyperplane for any number of dimensions can be
used for regression or classification

Nueral Networks - correct answer-a method in artificial intelligence that
teaches computers to process data in a way that is inspired by the human
brain.

Decision Tree Hyperparameters - correct answer-Many. Includes
min_samples_leaf , min_samples_split , max_leaf_nodes , or
min_impurity_decrease

K-Nearest Neighbor Hyperparameters - correct answer-K-value and distance
function

Decision tree disadvantages - correct answer--Prone to outliers
-tree can grow to be very complex while training complex datasets

, K-Nearest Neighbor disadvantages - correct answer--K has to be wisely
selected
-Large computation cost during runtime if sample size is large

What are two variable selection criteria? - correct answer--Entropy and
Information Gain
-Gini Index

Pure when Entropy = - correct answer-0

Impure when Entropy = - correct answer-1

Entropy - correct answer-a measure of the disorder of a system or energy
unavailable to do work.

Why the minus in the Entropy formula - correct answer-Probabilities are
always between 0 and 1.
log(x) where x < 1 is negative
Each term in the sum is negative, so the result of the sum negative meaning
that the minus makes the result positive

Information Gain - correct answer-the amount of knowledge acquired during a
certain decision or action

Random forests - correct answer--for supervised machine learning, where
there is a labeled target variable
-used for solving regression (numeric target variable) and classification
(categorical target variable) problems
-an ensemble method, meaning they combine predictions from other models
-Each of the smaller models in the random forest ensemble is a decision tree

What is the best hyperplane? - correct answer-The one that maximizes
distance from the hyperplane to data points

Margin - correct answer-the distance between hyperplane and data points

What is the name for the points closest to the hyperplane - correct
answer-Support Vectors

Written for

Course

Document information

Uploaded on
July 2, 2024
Number of pages
8
Written in
2023/2024
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$8.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
scholartutor Chamberlain College Of Nursing
Follow You need to be logged in order to follow users or courses
Sold
2770
Member since
1 year
Number of followers
3
Documents
10727
Last sold
1 day ago

4.8

923 reviews

5
813
4
79
3
20
2
7
1
4

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions