Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

"Comprehensive Notes on Data Mining and Data Warehousing Techniques for Effective Information Extraction and Analysis"

Rating
-
Sold
-
Pages
5
Uploaded on
30-04-2023
Written in
2021/2022

Data mining and data warehousing are two related fields that deal with the process of extracting useful information from large datasets. Your DMDW notes might cover a wide range of topics, including data preprocessing, data mining techniques such as clustering, classification, and association rule mining, and data warehousing concepts such as OLAP and data cubes. Your notes may also include information on how to use data mining and data warehousing techniques to solve real-world problems in various industries, such as finance, healthcare, retail, and manufacturing. Additionally, you may have notes on the challenges and ethical considerations involved in working with large datasets. Overall, your DMDW notes are likely to be a comprehensive guide to the techniques and tools used in data mining and data warehousing, with a focus on effective information extraction and analysis.

Show more Read less
Institution
Course

Content preview

Data Mining – Cluster Analysis
Cluster Analysis is the process to find similar groups of objects
in order to form clusters. It is an unsupervised machine
learning-based algorithm that acts on unlabelled data. A group
of data points would comprise together to form a cluster in
which all the objects would belong to the same group.
Cluster:
The given data is divided into different groups by combining
similar objects into a group. This group is nothing but a cluster.
A cluster is nothing but a collection of similar data which is
grouped together.
For example, consider a dataset of vehicles given in which it
contains information about different vehicles like cars, buses,
bicycles, etc. As it is unsupervised learning there are no class
labels like Cars, Bikes, etc for all the vehicles, all the data is
combined and is not in a structured manner.
Now our task is to convert the unlabelled data to labelled data
and it can be done using clusters.
The main idea of cluster analysis is that it would arrange all the
data points by forming clusters like cars cluster which contains
all the cars, bikes clusters which contains all the bikes, etc.
Simply it is the partitioning of similar objects which are applied
to unlabelled data.
Properties of Clustering :
1. Clustering Scalability: Nowadays there is a vast amount
of data and should be dealing with huge databases. In order to
handle extensive databases, the clustering algorithm should be
scalable. Data should be scalable, if it is not scalable, then we
can’t get the appropriate result which would lead to wrong
results.
2. High Dimensionality: The algorithm should be able to
handle high dimensional space along with the data of small
size.
3. Algorithm Usability with multiple data kinds: Different
kinds of data can be used with algorithms of clustering. It

, should be capable of dealing with different types of data like
discrete, categorical and interval-based data, binary data etc.
4. Dealing with unstructured data: There would be some
databases that contain missing values, and noisy or erroneous
data. If the algorithms are sensitive to such data then it may
lead to poor quality clusters. So it should be able to handle
unstructured data and give some structure to the data by
organising it into groups of similar data objects. This makes the
job of the data expert easier in order to process the data and
discover new patterns.
5. Interpretability: The clustering outcomes should be
interpretable, comprehensible, and usable. The interpretability
reflects how easily the data is understood.
Clustering Methods:
The clustering methods can be classified into the following
categories:
 Partitioning Method
 Hierarchical Method
 Density-based Method
 Grid-Based Method
 Model-Based Method
 Constraint-based Method
Partitioning Method: It is used to make partitions on the data
in order to form clusters. If “n” partitions are done on “p”
objects of the database then each partition is represented by a
cluster and n < p. The two conditions which need to be
satisfied with this Partitioning Clustering Method are:
 One objective should only belong to only one group.
 There should be no group without even a single purpose.
In the partitioning method, there is one technique called
iterative relocation, which means the object will be moved from
one group to another to improve the partitioning
Hierarchical Method: In this method, a hierarchical
decomposition of the given set of data objects is created. We
can classify hierarchical methods and will be able to know the
purpose of classification on the basis of how the hierarchical
decomposition is formed.

Written for

Institution
Course

Document information

Uploaded on
April 30, 2023
Number of pages
5
Written in
2021/2022
Type
Class notes
Professor(s)
Student
Contains
All classes

Subjects

$8.39
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
ajijnadaf

Also available in package deal

Get to know the seller

Seller avatar
ajijnadaf Shri shahaji chhatrapati mahavidya
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
3 year
Number of followers
0
Documents
6
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions