Why is Data Mining Useful? - Answers -there has been an explosive growth of data (from terabytes to
petabytes)
-data collection and data availability make it useful
-We are drowning in Data, but starving for knowledge!
-We are Data Rich, but information poor
What are major sources of data? - Answers -Business: Web, e-commerce, transactions, stocks
-Science: remote sensing, bioinformatics, scientific simulation
-society and everyone: news, digital camera, Youtube
-Internet of Things
Data Mining (4 qualities)
-n/t
-i
-pr/un
-po/us - Answers Knowledge Discovery from Data
-Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or
knowledge from huge amounts of data
What are alternative names for Data Mining? - Answers -knowledge discovery in databases (KDD)
-knowledge extraction
-data/pattern analysis
-information harvesting
-business intelligence
Knowledge Discovery from Data Process (KDD)
(7 steps) - Answers -View from Typical Database Systems and Data warehousing communities
Databases -> Data Cleaning -> Data Warehouse -> Selection of Task Relevant Data -> Data Mining ->
Pattern Evaluation -> Knowledge
-Data mining plays an essential role in the knowledge discovery process
KDD Process: A Typical View from ML and Statistics
5 Steps - Answers Input Data -> Data Pre-Processing -> Data Mining -> Post Processing -> Pattern
Information Knowledge
Data Pre-Processing - Answers Data Integration, normalization, feature selection, dimension
reduction
Data Mining - Answers Pattern discovery, association & correlation, classification, clustering, outlier
analysis
Post- Processing - Answers Pattern evaluation, pattern selection, pattern interpretation, pattern
visualization
Data Mining in Business Intelligence
6 Steps - Answers Data Sources (Papers, files, web docs, science experiments, database systems) ->
Data preprocessing/ Integration, Data warehouses -> Data Exploration (statistical summary, querying,
and reporting) -> Data Mining (information Discovery) -> Data Presentation (Visualization Techniques)
-> Decision Making
Data Scientists - Answers -exploring, asking questions, doing "what if" analysis, questioning existing
assumptions and proccesses
, Business Analyst - Answers combine deep analytical skills with strong communication skills and
strategic mind set to transform data into a competitive asset
Data to be Mined - Answers Database data (extended relational, object oriented, hetergeneous,
legacy), data warehouse, transactional data, stream, spatiotemporal, time-series, sequence, text and
web, multi-media, graphs & social and information networks
Knowledge to be Mined (or Data Mining Functions) - Answers -Characterization, discrimination,
association, classification, clustering, trend/deviation, outlier analysis
-descriptive v predictive data mining
-multiple/integrated functions and mining at multiple levels
Techniques Utilized - Answers Data-intensive, data warehouse (OLAP), machine learning, statistics,
pattern recognition, visualization, high-performance
Applications Adapted - Answers Retail, telecommunication, banking, fraud analysis, bio-data mining,
stock market analysis, text mining, web mining
What kind of Data can be mined? - Answers -Database-oriented data sets and applications like:
1.) relational database, SQL, trends, data patterns
2.) Data warehouse: organized around subjects, historical perspective, summarized
3.) Transactional db
-Advanced data sets and advanced applications
1.) data streams and sensor data
2.) Time-series data, temporal data, sequence data
3.) structure data, graphs, social networks, multi-linked data
4.) object-relational databases
5.) heterogeneous db and legacy db
6.) Spatial data and spatiotemporal data
7.) Multimedia db
8.) Text db
9.) World Wide Web
First Function of Data Mining: Generalization - Answers Information integration and data warehouse
construction -> data cleaning, transformation, integration, and multidimensional data model
-Multidimensional concept description: characterization and discrimination -> generalize, summarize,
and contrast data characteristics (eg: dry v. wet region)
Second Function of Data Mining: Association and Correlation Analysis - Answers -Frequent Patterns ->
What items are frequently purchased together in your Walmart?
-Association, correlation v causality -> A typical association rule
-How to mine such patterns and rules efficiently in large dataset?
Third Function of Data Mining: Classification - Answers -Classification and label prediction (construct
models based on some training example, describe and distinguish classes or concepts for future
prediction; predict class label)
-Typical Methods (decision trees, naive Bayesian classification, support vector machines, neural
networks, rule-based classification, pattern-based classification, logistic regression)
-Typical Application (credit card fraud detection, direct marketing, classifying stars, diseases, web-
pages
Fourth Function of Data Mining: Cluster Analysis - Answers -Unsupervised
learning (class label is unknown)
-Group data to form categories (cluster) and cluster consumer to find patterns
-Principle: Maximizing intra-class similarity and minimizing interclass similarity