Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Data mining

Rating
-
Sold
-
Pages
243
Uploaded on
25-02-2024
Written in
2023/2024

Providing indepth meaning about all the concepts in data mining.

Institution
Course

Content preview

LECTURE NOTES

ON

DATA WAREHOUSING
AND DATA MINING
III B. Tech I semester (R18)

By

M.RAVI

ASSISTANT PROFESSOR
IT DEPARTMENT
JBIET

, UNIT I

What motivated data mining? Why is it important?

The major reason that data mining has attracted a great deal of attention in information
industry in recent years is due to the wide availability of huge amounts of data and the
imminent need for turning such data into useful information and knowledge. The
information and knowledge gained can be used for applications ranging from business
management, production control, and market analysis, to engineering design and science
exploration.

The evolution of database technology
Data collection and Database Creation
(1960s and earlier)
Primitive file processing



Database Management Systems
(1970s-early 1980s)
1) Hierarchical and network database system
2) Relational database system
3) Data modeling tools: entity-relational models, etc
4) Indexing and accessing methods: B-trees, hashing etc.
5) Query languages: SQL, etc.
User Interfaces, forms and reports
6) Query Processing and Query Optimization
7) Transactions, concurrency control and recovery
8) Online transaction Processing (OLTP)



Advanced Data Analysis: Web based databases
Advanced Database Systems
Data warehousing and Data mining (1990s-present)
(mid 1980s-present)
(late 1980s-present) 1) XML- based database
1) Advanced Data models:
1)Data warehouse and OLAP systems
Extended relational, object-
2)Data mining and knowledge 2)Integration with
relational ,etc.
discovery:generalization,classification,associ information retrieval
2) Advanced applications;
ation,clustering,frequent pattern, outlier 3)Data and information
Spatial, temporal,
analysis, etc Integration
multimedia, active stream
3)Advanced data mining applications:
and sensor, knowledge
Stream data mining,bio-data mining, text
based
mining, web mining etc



New Generation of Integrated Data and Information Systems(present future)

, What is data mining?

Data mining refers to extracting or mining" knowledge from large amounts of data. There are
many other terms related to data mining, such as knowledge mining, knowledge extraction,
data/pattern analysis, data archaeology, and data dredging. Many people treat data mining
as a synonym for another popularly used term, Knowledge Discovery in
Databases", or KDD

Essential step in the process of knowledge discovery in databases

Knowledge discovery as a process is depicted in following figure and consists of an
iterative sequence of the following steps:

data cleaning: to remove noise or irrelevant data
data integration: where multiple data sources may be combined
data selection: where data relevant to the analysis task are retrieved from the
database
data transformation: where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations
data mining :an essential process where intelligent methods are applied in order to
extract data patterns
pattern evaluation to identify the truly interesting patterns representing knowledge based
on some interestingness measures
knowledge presentation: where visualization and knowledge representation
techniques are used to present the mined knowledge to the user.

Architecture of a typical data mining system/Major Components

Data mining is the process of discovering interesting knowledge from large amounts of data
stored either in databases, data warehouses, or other information repositories. Based on this
view, the architecture of a typical data mining system may have the following major
components:

1. A database, data warehouse, or other information repository, which consists of the set
of databases, data warehouses, spreadsheets, or other kinds of information
repositories containing the student and course information.
2. A database or data warehouse server which fetches the relevant data based on
users‘ data mining requests.
3. A knowledge base that contains the domain knowledge used to guide the search or to
evaluate the interestingness of resulting patterns. For example, the knowledge
base may contain metadata which describes data from multiple heterogeneous
sources.
4. A data mining engine, which consists of a set of functional modules for tasks such as
classification, association, classification, cluster analysis, and evolution and
deviation analysis.
5. A pattern evaluation module that works in tandem with the data mining
modules by employing interestingness measures to help focus the search
towards interestingness patterns.

, 6. A graphical user interface that allows the user an interactive approach to the data
mining system.




How is a data warehouse different from a database? How are they similar?

• Differences between a data warehouse and a database: A data warehouse is a repository
of information collected from multiple sources, over a history of time, stored under a
unified schema, and used for data analysis and decision support; whereas a database, is a
collection of interrelated data that represents the current status of the stored data. There
could be multiple heterogeneous databases where the schema of one database may not
agree with the schema of another. A database system supports ad-hoc query and on-line
transaction processing. For more details, please refer to the section “Differences
between operational database systems and data warehouses.”

• Similarities between a data warehouse and a database: Both are repositories of
information, storing huge amounts of persistent data.

Data mining: on what kind of data? / Describe the following advanced
database systems and applications: object-relational databases, spatial
databases, text databases, multimedia databases, the World Wide Web.

In principle, data mining should be applicable to any kind of information repository. This
includes relational databases, data warehouses, transactional databases, advanced
database systems,
flat files, and the World-Wide Web. Advanced database systems include object-oriented
and object-relational databases, and special c application-oriented databases, such as

Connected book

Written for

Institution
Course

Document information

Uploaded on
February 25, 2024
Number of pages
243
Written in
2023/2024
Type
Class notes
Professor(s)
Ravi
Contains
All classes

Subjects

$8.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
ganeshk

Get to know the seller

Seller avatar
ganeshk Self
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
2 year
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions