1. The data mining process: Selection - preprocessing - transformation - datamin- ing -
interpretation/evaluation
2. what is the relationship between objects and attributes?: A collection of attributes describe
an object
An attribute is a characteristic/property of an object (aka variable, field, feature, ...)
3. What's the relationship between attributes and attribute values?: Same at- tribute can be
mapped to different values. Different attributes can be mapped to a same set of values. Note
that properties of attribute values can be different (age vs id)
4. Core properties that separate the types of attributes?: - Distnctness: {=,`}
- Order:{<,>}
- Addi8on:{+,- }
- Mul8plica8on:{*,/}
5. What are different types of attributes? what properties does each of the attributes have?:
Nominal (D)
Ordinal (DO)
Interval (DOA) --> unit of measurement exists Ratio
(DOAM)
6. Temperatures in Celcius, Farenheit, Kelvin, which one is Interval, which one is Ratio: only Temp in
Kelvin is Ratio
7. What is the difference between Interval and Ratio?: In Interval, only the differences are
meaningful
In Ratio, both of the differences and ratios are meaningful.
8. Define Median: middle value if odd number of values, avg of middle 2 values if even number
of values
summary of freq distribution
9. What is the arithmetic mean and weighted arithmetic mean?: Arith mean: x mean = x(i)/n
Weighted Arith mean: x mean = sum(w(I)*x(I))/sum(w(I))
only apply to numeric data
10.What is variance?: A measure of the spread of data in ref to mean (X bar)
Sum[ (Xi - Xbar)square ]/(n-1)
11.What is Covariance?: a measure of how much each of the dimensions varies from the mean
with respect to each other.
cov(X,Y) = sum[ (Xi - Xbar)(Yi-Ybar)]/(n-1)
12.What does it mean if Cov is pos, neg, zero?: POS: both dimensions increase or decrease
together.
1/