Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

Data mining Notes

Beoordeling
-
Verkocht
-
Pagina's
82
Cijfer
A+
Geüpload op
12-05-2024
Geschreven in
2023/2024

Documents provided are about Data mining for Btech Students

Instelling
Vak

Voorbeeld van de inhoud

Chapter-1



What Is Data Mining?

Data mining refers to extracting or mining knowledge from large amountsof data. The term is
actually a misnomer. Thus, data miningshould have been more appropriately named as
knowledge mining which emphasis on mining from large amounts of data.


It is the computational process of discovering patterns in large data sets involving methods at the
intersection of artificial intelligence, machine learning, statistics, and database systems.
The overall goal of the data mining process is to extract information from a data set and
transform it into an understandable structure for further use.


The key properties of data mining are
Automatic discovery of patterns
Prediction of likely outcomes
Creation of actionable information
Focus on large datasets and databases


The Scope of Data Mining


Data mining derives its name from the similarities between searching for valuable business
information in a large database — for example, finding linked products in gigabytes of store
scanner data — and mining a mountain for a vein of valuable ore. Both processes require either
sifting through an immense amount of material, or intelligently probing it to find exactly where
the value resides. Given databases of sufficient size and quality, data mining technology can
generate new business opportunities by providing these capabilities:

,Automated prediction of trends and behaviors. Data mining automates the process of finding
predictive information in large databases. Questions that traditionally required extensive hands-
on analysis can now be answered directly from the data — quickly. A typical example of a
predictive problem is targeted marketing. Data mining uses data on past promotional mailings to
identify the targets most likely to maximize return on investment in future mailings. Other
predictive problems include forecasting bankruptcy and other forms of default, and identifying
segments of a population likely to respond similarly to given events.


Automated discovery of previously unknown patterns. Data mining tools sweep through
databases and identify previously hidden patterns in one step. An example of pattern discovery is
the analysis of retail sales data to identify seemingly unrelated products that are often purchased
together. Other pattern discovery problems include detecting fraudulent credit card transactions
and identifying anomalous data that could represent data entry keying errors.


Tasks of Data Mining
Data mining involves six common classes of tasks:
Anomaly detection (Outlier/change/deviation detection) – The identification of
unusual data records, that might be interesting or data errors that require further
investigation.
Association rule learning (Dependency modelling) – Searches for relationships
between variables. For example a supermarket might gather data on customer purchasing
habits. Using association rule learning, the supermarket can determine which products are
frequently bought together and use this information for marketing purposes. This is
sometimes referred to as market basket analysis.
Clustering – is the task of discovering groups and structures in the data that are in some
way or another "similar", without using known structures in the data.


Classification – is the task of generalizing known structure to apply to new data. For
example, an e-mail program might attempt to classify an e-mail as "legitimate" or as
"spam".
Regression – attempts to find a function which models the data with the least error.

, Summarization – providing a more compact representation of the data set, including
visualization and report generation.



Architecture of Data Mining

A typical data mining system may have the following major components.




1. Knowledge Base:


This is the domain knowledge that is used to guide the search orevaluate the
interestingness of resulting patterns. Such knowledge can include concepthierarchies,

, used to organize attributes or attribute values into different levels of abstraction.
Knowledge such as user beliefs, which can be used to assess a pattern’s
interestingness based on its unexpectedness, may also be included. Other examples of
domain knowledge are additional interestingness constraints or thresholds, and
metadata (e.g., describing data from multiple heterogeneous sources).


2. Data Mining Engine:


This is essential to the data mining systemand ideally consists ofa set of functional
modules for tasks such as characterization, association and correlationanalysis,
classification, prediction, cluster analysis, outlier analysis, and evolutionanalysis.


3. Pattern Evaluation Module:


This component typically employs interestingness measures interacts with the data
mining modules so as to focus thesearch toward interesting patterns. It may use
interestingness thresholds to filterout discovered patterns. Alternatively, the pattern
evaluation module may be integratedwith the mining module, depending on the
implementation of the datamining method used. For efficient data mining, it is highly
recommended to pushthe evaluation of pattern interestingness as deep as possible into
the mining processso as to confine the search to only the interesting patterns.


4. User interface:


Thismodule communicates between users and the data mining system,allowing the
user to interact with the system by specifying a data mining query ortask, providing
information to help focus the search, and performing exploratory datamining based on
the intermediate data mining results. In addition, this componentallows the user to
browse database and data warehouse schemas or data structures,evaluate mined
patterns, and visualize the patterns in different forms.

Geschreven voor

Instelling
Vak

Documentinformatie

Geüpload op
12 mei 2024
Aantal pagina's
82
Geschreven in
2023/2024
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$7.39
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
shivampal

Maak kennis met de verkoper

Seller avatar
shivampal SRIST
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
-
Lid sinds
2 jaar
Aantal volgers
0
Documenten
5
Laatst verkocht
-
Engineering study notes

Here we provide study notes for engineering students easily.

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen