complete updated questions & answers
Why is data visualization helpful? - answer 1. amplifies cognition
2. expands working memory
3. reduces search time
4. improves pattern detection
5. controls attention
Describe the difference between data processing and querying - answer
In both instances, the user knows what they want. The difference is that
with querying, they can only describe it whereas in data processing they
actually have a way to compute it.
Describe the difference between data exploration and navigation -
answer In exploration, the user does not know what they want but
wants to get an idea about the data. In navigation, the user DOES know
what they want but does not know how to describe/locate it.
What do these acronyms describe? INS, 3Vs, HMLE - answer Data
challenges
,What does INS stand for and mean? - answer INS is a lit of data
challenges: Imprecision, Noise, Sparsity
What does 3Vs stand for and mean? - answer 3Vs is a list of data
challenges: Volume, Velocity, Variety
What does HMLE stand for and mean? - answer HMLE is a list of data
challenges: High-dimensional, Multi-modal, Inter-Linked, Evolving
What is a data schema? - answer A set of constraints that...
1. describe the "properties" of data
2. describe the structure of data
3. enable validation & efficient storage of data
4. enable querying and retrieval of data
Advantages of a structured database? - answer 1. Easier to query
2. Easier to optimize
3. Easier to explore
Advantages of semi-structured database? - answer Data organization is
flexible/malleable (easier to integrate and exchange).
, Describe the curse of dimensionality - answer The more dimensions we
have, the more data we need to discover patterns (prevent overfitting).
True or False: The distance between two points is equal to the length of
the distance vector. - answer True
Give an example of data transformation - answer Gender column
originally being a single column with "M" or "F", and then is
transformed into two columns (one for M and one for F) with 0s and 1s
to confirm sex.
Which aspects of data should be handled by a scalable data exploratory
system? - answer 1. The amount of data
2. The diversity of the data types
3. The speed of new data generated
Give an example of prefix search - answer Find all strings that start with
"tab"
• "table"; "tabular"; "tablet";...
Give an example of subsequence search - answer Find all strings that
contain the subsequence "ark"
• "marketing"; "spark"; "quark";...