Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

DATA MINING QUESTIONS & ANSWERS|| 2026 LATEST UPDATE|| VERIFIED A+

Beoordeling
-
Verkocht
-
Pagina's
28
Cijfer
A+
Geüpload op
05-02-2026
Geschreven in
2025/2026

What is data mining? - ANSWERThe process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis What are the steps involved in data mining when viewed as a process of knowledge discovery? - ANSWERData Cleaning Data Integration Data Selection Data Transformation Data Mining Pattern Evaluation Knowledge Presentation

Meer zien Lees minder
Instelling
DATA MINING
Vak
DATA MINING

Voorbeeld van de inhoud

DATA MINING QUESTIONS &
ANSWERS|| 2026 LATEST UPDATE||
VERIFIED A+
What is data mining? - ANSWERThe process of sorting through large data sets to
identify patterns and establish relationships to solve problems through data analysis

What are the steps involved in data mining when viewed as a process of knowledge
discovery? - ANSWERData Cleaning
Data Integration
Data Selection
Data Transformation
Data Mining
Pattern Evaluation
Knowledge Presentation

What are the data mining functionalities - ANSWERCharacterization and discrimination
Mining of frequent patterns, associations, and correlations Classification and regression
Clustering analysis
Outlier analysis

Data Characterization - ANSWERA summary of the general characteristics or features
of a target class of data. The data corresponding to the user-specified class is typically
collected by a query. For example, to study the characteristics of software products with
sales that increased by 10% in the previous year, the data related to such products can
be collected by executing an SQL query on the sales database.

Data discrimination - ANSWERcomparison of the target class with one or a set of
comparative classes

Data mining methodology challenges - ANSWERMining various and new kinds of
knowledge
Mining knowledge in multidimensional space
Integrating new methods from multiple disciplines
Boosting the power of discovery in a networked environment
Handling uncertainty, noise, or incompleteness of data
Pattern evaluation and pattern- or constraint-guided mining

Explain one challenge of mining a huge amount of data in comparison with mining a
small amount of data. - ANSWERAlgorithms that deal with data need to scale nicely so
that even vast amounts of data can be handled efficiently, and take short amounts of
time

,What is an outlier? - ANSWERAn object which does not fit in with the general behavior
of the model.

Does an outlier need to be discarded always? - ANSWERIn most cases of data mining,
outliers are discarded. However, there are special circumstances, such as fraud
detection, where outliers can be useful.

The mode is the only measure of central tendency that can be used for nominal
attributes. (T/F) - ANSWERTrue. An example of this would be hair color, with different
categories such as black, brown, blond, and red. Which one is the most common one?

Nominal attribute - ANSWERrefer to symbols or names of things. Categorical. It can
also be represented using a number, however, they are not meant to be used
quantitatively. Has no median, but has a mode

Binary Attributes - ANSWERA nominal attribute with only two categories or states: 0 or
1, where
0 typically means that the attribute is absent, and 1 means that it is present.

Ordinal Attributes - ANSWERAn attribute with possible values that have a meaningful
order or
ranking among them, but the magnitude between successive values is not known.

Numeric Attributes - ANSWERQuantitative; that is, it is a measurable quantity,
represented in
integer or real values. Can be interval-scaled or ratio-scaled.

Discrete Attribute - ANSWERhas a finite or countably infinite set of variables

Continuous Attributes - ANSWERtypically represented as floating-point variables.

The mean is in general affected by outliers (T/F) - ANSWERTrue

Not all numerical data sets have a median. (T/F) - ANSWERFalse

What are the differences between the measures of central tendency and the measures
of dispersion? - ANSWERThe measures of central tendency are the mean, median,
mode and midrange. They are used to measure the location of the middle or the center
of the data distribution, basically where the most values fall. Whereas, the dispersion
measures are the range, quartiles, interquartile range, the five-number summary,
boxplots, the variance and standard deviation of the data. They are mainly used to find
an idea of the dispersion of the data, how is the data spread out, and to identify outliers.

How would you catalog a boxplot, as a measure of dispersion or as a data visualization
aid? Why? - ANSWERAs a data visualization aid. The boxplot shows how the
boundaries relate to each other visually, where the minimum, maximum values lie, and

, the Interquartile ranges with a line signifying the median. It does not give you a specific
measure, but allows you to somewhat visualize the data set. For example, if you have a
boxplot for the grades in a class, if the box is closer to the minimum boundary then you
can see that most scores were low.

What do we understand by similarity measure? - ANSWERIt quantifies the similarity
between two objects. Usually, large values are for similar objects and zero or negative
values are for dissimilar objects.

What is the importance of similarity measures - ANSWERThey are important because
they help us see patterns in data. They also give us knowledge about our data. They
are used in clustering algorithms. Similar data points are put into the same clusters, and
dissimilar points are placed into different clusters.

What do we understand by dissimilarity measure and what is its importance? -
ANSWERMeasuring the difference between to objects, the greater the difference
between two objects the higher the value.

What is the importance of dissimilarity measures - ANSWERThe importance of this is
that in some instances, having two objects with low dissimilarity could mean something
negative. For example, cheating.

Discuss one of the distance measures that are commonly used for computing the
dissimilarity of objects described by numeric attributes. - ANSWEREuclidean distance
d(i, j) =sqrt((xi1 − xj1)^2 + (xi2 − xj2)^2 +··· )
Manhattan Distance |x1 - x2| + |y1 - y2|
Minkowski distance d(i, j) = sqrt(h, |xi1 − xj1|^h + |xi2 − xj2|^h + ...)
Supremum distance d(i, j) = max(f, p) |xif − xjf |

In many real-life databases, objects are described by a mixture of attribute types. How
can we compute the dissimilarity between objects of mixed attribute types? -
ANSWERIn order to determine the dissimilarity between objects of mixed attributes
there are two main approaches. One of them indicates to separate each attribute type
and do a data mining analysis for each of them. This method is acceptable if the results
are consistent. Applying this method to real life projects is not viable as analyzing the
attribute types separately will most likely generate different results. The second
approach is more acceptable. It processes all attributes types together and do only one
analysis by combining the attributes into a dissimilarity matrix

What do we understand by data quality and what is its importance? - ANSWERWhen an
object satisfies the requirements of the intended use. It has many factors like: including
accuracy, completeness, consistency, timeliness, believability, and interpretability. It
also depends on the intended use of the data, for some users the data may be
inconsistent, but for others, it can just be hard to interpret.

Geschreven voor

Instelling
DATA MINING
Vak
DATA MINING

Documentinformatie

Geüpload op
5 februari 2026
Aantal pagina's
28
Geschreven in
2025/2026
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$16.99
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
shantelleG West Virgina University
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
641
Lid sinds
3 jaar
Aantal volgers
369
Documenten
18264
Laatst verkocht
6 dagen geleden
GOLD PREMIUM

HELLO? welcome to my store thanks for visiting this page here you are guaranteed of well revised and assured EXAMS ALL GRADED A+ thus making your education journey easy and seamless . DO NOT HESITATE TO CONTACT ME IF YOU ARE IN NEED OF ANY EXAM .I AM READY 24/7 TO ASSIST YOU ALSO REFER YOUR FRIENDS.

4.0

118 beoordelingen

5
69
4
11
3
24
2
2
1
12

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen