LECTURE NOTES
ON
DATA WAREHOUSING
AND DATA MINING
III B. Tech I semester (R18)
By
M.RAVI
ASSISTANT PROFESSOR
IT DEPARTMENT
JBIET
, UNIT I
What motivated data mining? Why is it important?
The major reason that data mining has attracted a great deal of attention in information
industry in recent years is due to the wide availability of huge amounts of data and the
imminent need for turning such data into useful information and knowledge. The
information and knowledge gained can be used for applications ranging from business
management, production control, and market analysis, to engineering design and science
exploration.
The evolution of database technology
Data collection and Database Creation
(1960s and earlier)
Primitive file processing
Database Management Systems
(1970s-early 1980s)
1) Hierarchical and network database system
2) Relational database system
3) Data modeling tools: entity-relational models, etc
4) Indexing and accessing methods: B-trees, hashing etc.
5) Query languages: SQL, etc.
User Interfaces, forms and reports
6) Query Processing and Query Optimization
7) Transactions, concurrency control and recovery
8) Online transaction Processing (OLTP)
Advanced Data Analysis: Web based databases
Advanced Database Systems
Data warehousing and Data mining (1990s-present)
(mid 1980s-present)
(late 1980s-present) 1) XML- based database
1) Advanced Data models:
1)Data warehouse and OLAP systems
Extended relational, object-
2)Data mining and knowledge 2)Integration with
relational ,etc.
discovery:generalization,classification,associ information retrieval
2) Advanced applications;
ation,clustering,frequent pattern, outlier 3)Data and information
Spatial, temporal,
analysis, etc Integration
multimedia, active stream
3)Advanced data mining applications:
and sensor, knowledge
Stream data mining,bio-data mining, text
based
mining, web mining etc
New Generation of Integrated Data and Information Systems(present future)
, What is data mining?
Data mining refers to extracting or mining" knowledge from large amounts of data. There are
many other terms related to data mining, such as knowledge mining, knowledge extraction,
data/pattern analysis, data archaeology, and data dredging. Many people treat data mining
as a synonym for another popularly used term, Knowledge Discovery in
Databases", or KDD
Essential step in the process of knowledge discovery in databases
Knowledge discovery as a process is depicted in following figure and consists of an
iterative sequence of the following steps:
data cleaning: to remove noise or irrelevant data
data integration: where multiple data sources may be combined
data selection: where data relevant to the analysis task are retrieved from the
database
data transformation: where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations
data mining :an essential process where intelligent methods are applied in order to
extract data patterns
pattern evaluation to identify the truly interesting patterns representing knowledge based
on some interestingness measures
knowledge presentation: where visualization and knowledge representation
techniques are used to present the mined knowledge to the user.
Architecture of a typical data mining system/Major Components
Data mining is the process of discovering interesting knowledge from large amounts of data
stored either in databases, data warehouses, or other information repositories. Based on this
view, the architecture of a typical data mining system may have the following major
components:
1. A database, data warehouse, or other information repository, which consists of the set
of databases, data warehouses, spreadsheets, or other kinds of information
repositories containing the student and course information.
2. A database or data warehouse server which fetches the relevant data based on
users‘ data mining requests.
3. A knowledge base that contains the domain knowledge used to guide the search or to
evaluate the interestingness of resulting patterns. For example, the knowledge
base may contain metadata which describes data from multiple heterogeneous
sources.
4. A data mining engine, which consists of a set of functional modules for tasks such as
classification, association, classification, cluster analysis, and evolution and
deviation analysis.
5. A pattern evaluation module that works in tandem with the data mining
modules by employing interestingness measures to help focus the search
towards interestingness patterns.
, 6. A graphical user interface that allows the user an interactive approach to the data
mining system.
How is a data warehouse different from a database? How are they similar?
• Differences between a data warehouse and a database: A data warehouse is a repository
of information collected from multiple sources, over a history of time, stored under a
unified schema, and used for data analysis and decision support; whereas a database, is a
collection of interrelated data that represents the current status of the stored data. There
could be multiple heterogeneous databases where the schema of one database may not
agree with the schema of another. A database system supports ad-hoc query and on-line
transaction processing. For more details, please refer to the section “Differences
between operational database systems and data warehouses.”
• Similarities between a data warehouse and a database: Both are repositories of
information, storing huge amounts of persistent data.
Data mining: on what kind of data? / Describe the following advanced
database systems and applications: object-relational databases, spatial
databases, text databases, multimedia databases, the World Wide Web.
In principle, data mining should be applicable to any kind of information repository. This
includes relational databases, data warehouses, transactional databases, advanced
database systems,
flat files, and the World-Wide Web. Advanced database systems include object-oriented
and object-relational databases, and special c application-oriented databases, such as
ON
DATA WAREHOUSING
AND DATA MINING
III B. Tech I semester (R18)
By
M.RAVI
ASSISTANT PROFESSOR
IT DEPARTMENT
JBIET
, UNIT I
What motivated data mining? Why is it important?
The major reason that data mining has attracted a great deal of attention in information
industry in recent years is due to the wide availability of huge amounts of data and the
imminent need for turning such data into useful information and knowledge. The
information and knowledge gained can be used for applications ranging from business
management, production control, and market analysis, to engineering design and science
exploration.
The evolution of database technology
Data collection and Database Creation
(1960s and earlier)
Primitive file processing
Database Management Systems
(1970s-early 1980s)
1) Hierarchical and network database system
2) Relational database system
3) Data modeling tools: entity-relational models, etc
4) Indexing and accessing methods: B-trees, hashing etc.
5) Query languages: SQL, etc.
User Interfaces, forms and reports
6) Query Processing and Query Optimization
7) Transactions, concurrency control and recovery
8) Online transaction Processing (OLTP)
Advanced Data Analysis: Web based databases
Advanced Database Systems
Data warehousing and Data mining (1990s-present)
(mid 1980s-present)
(late 1980s-present) 1) XML- based database
1) Advanced Data models:
1)Data warehouse and OLAP systems
Extended relational, object-
2)Data mining and knowledge 2)Integration with
relational ,etc.
discovery:generalization,classification,associ information retrieval
2) Advanced applications;
ation,clustering,frequent pattern, outlier 3)Data and information
Spatial, temporal,
analysis, etc Integration
multimedia, active stream
3)Advanced data mining applications:
and sensor, knowledge
Stream data mining,bio-data mining, text
based
mining, web mining etc
New Generation of Integrated Data and Information Systems(present future)
, What is data mining?
Data mining refers to extracting or mining" knowledge from large amounts of data. There are
many other terms related to data mining, such as knowledge mining, knowledge extraction,
data/pattern analysis, data archaeology, and data dredging. Many people treat data mining
as a synonym for another popularly used term, Knowledge Discovery in
Databases", or KDD
Essential step in the process of knowledge discovery in databases
Knowledge discovery as a process is depicted in following figure and consists of an
iterative sequence of the following steps:
data cleaning: to remove noise or irrelevant data
data integration: where multiple data sources may be combined
data selection: where data relevant to the analysis task are retrieved from the
database
data transformation: where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations
data mining :an essential process where intelligent methods are applied in order to
extract data patterns
pattern evaluation to identify the truly interesting patterns representing knowledge based
on some interestingness measures
knowledge presentation: where visualization and knowledge representation
techniques are used to present the mined knowledge to the user.
Architecture of a typical data mining system/Major Components
Data mining is the process of discovering interesting knowledge from large amounts of data
stored either in databases, data warehouses, or other information repositories. Based on this
view, the architecture of a typical data mining system may have the following major
components:
1. A database, data warehouse, or other information repository, which consists of the set
of databases, data warehouses, spreadsheets, or other kinds of information
repositories containing the student and course information.
2. A database or data warehouse server which fetches the relevant data based on
users‘ data mining requests.
3. A knowledge base that contains the domain knowledge used to guide the search or to
evaluate the interestingness of resulting patterns. For example, the knowledge
base may contain metadata which describes data from multiple heterogeneous
sources.
4. A data mining engine, which consists of a set of functional modules for tasks such as
classification, association, classification, cluster analysis, and evolution and
deviation analysis.
5. A pattern evaluation module that works in tandem with the data mining
modules by employing interestingness measures to help focus the search
towards interestingness patterns.
, 6. A graphical user interface that allows the user an interactive approach to the data
mining system.
How is a data warehouse different from a database? How are they similar?
• Differences between a data warehouse and a database: A data warehouse is a repository
of information collected from multiple sources, over a history of time, stored under a
unified schema, and used for data analysis and decision support; whereas a database, is a
collection of interrelated data that represents the current status of the stored data. There
could be multiple heterogeneous databases where the schema of one database may not
agree with the schema of another. A database system supports ad-hoc query and on-line
transaction processing. For more details, please refer to the section “Differences
between operational database systems and data warehouses.”
• Similarities between a data warehouse and a database: Both are repositories of
information, storing huge amounts of persistent data.
Data mining: on what kind of data? / Describe the following advanced
database systems and applications: object-relational databases, spatial
databases, text databases, multimedia databases, the World Wide Web.
In principle, data mining should be applicable to any kind of information repository. This
includes relational databases, data warehouses, transactional databases, advanced
database systems,
flat files, and the World-Wide Web. Advanced database systems include object-oriented
and object-relational databases, and special c application-oriented databases, such as