f f
SOLUTION MANUAL
f
,Contents
1 Introduction 3
1.11f Exercises ................................................................................................................................................................ 3
2 Dataf Preprocessing 13
2.8 Exercises ...............................................................................................................................................................13
3 Dataf Warehousef andf OLAPf Technology:f Anf Overview 31
3.7 Exercises ...............................................................................................................................................................31
4 Dataf Cubef Computationf andf Dataf Generalization 41
4.5 Exercises ...............................................................................................................................................................41
5 Miningf Frequentf Patterns,f Associations,f andf Correlations 53
5.7 Exercises ...............................................................................................................................................................53
6 Classificationf andf Prediction 69
6.17f Exercises ...............................................................................................................................................................69
7 Clusterf Analysis 79
7.13f Exercises ...............................................................................................................................................................79
8 Miningf Stream,f Time-Series,f andf Sequencef Data 91
8.6 Exercises ...............................................................................................................................................................91
9 Graphf Mining,f Socialf Networkf Analysis,f andf Multirelationalf Dataf Mining 103
9.5 Exercises .............................................................................................................................................................103
10 Miningf Object,f Spatial,f Multimedia,f Text,f andf Webf Data 111
10.7 Exercises .............................................................................................................................................................111
11 Applicationsf andf Trendsf inf Dataf Mining 123
11.7 Exercises .............................................................................................................................................................123
1
,Chapter 1 f
Introduction
1.11 Exercises
1.1. Whatfisfdatafminingf?f Infyourfanswer,faddressftheffollowing:
(a) Isf itf anotherf hype?
(b) Isf itf af simplef transformationf off technologyf developedf fromf databases,f statistics,f andf machinef learning?
(c) Explainf howf thef evolutionf off databasef technologyf ledf tof dataf mining.
(d) Describef thef stepsf involvedf inf dataf miningf whenf viewedf asf af processf off knowledgef discovery.
Answer:
Datafminingfrefersftofthefprocessforfmethodfthatfextractsforf“mines”finterestingfknowledgeforfpatternsffro
mflargefamountsfoffdata.
(a) Isf itf anotherf hype?
Datafminingfisfnotfanotherfhype.f Instead,f thefneedfforfdatafminingfhasfarisenfdueftofthefwidef availabilityfof
fhuge famountsfof fdata fand fthe fimminent fneed fforfturningfsuchfdatafintofuseful finformationfandfknowledge. f
Thus,fdatafminingfcanfbefviewedfasfthefresultfoffthefnaturalfevolutionfoffinformationftechnology.
(b) Isfitfafsimpleftransformationfofftechnologyfdevelopedffromfdatabases,fstatistics,fandfmachineflearning?fN
o.f Datafminingfisfmorefthanfafsimpleftransformationfofftechnologyfdevelopedffromfdatabases,fsta-
ftistics,f andf machinef learning.f Instead,f dataf miningf involvesf anf integration, f ratherf thanf af simple
transformation,f off techniquesf fromf multiplef disciplinesf suchf asf databasef technology,f statistics,f ma-
chineflearning,fhigh-
performance fcomputing,fpatternfrecognition,fneuralfnetworks,fdatafvisualization,finformationf retrieval,f im
agef andf signalf processing,f andf spatialf dataf analysis.
(c) Explainf howf thef evolutionf off databasef technologyf ledf tof dataf mining.
Databaseftechnologyfbeganfwithfthefdevelopmentfoffdatafcollectionfandfdatabasefcreationfmechanismsfth
atfledftofthefdevelopmentfoffeffectivefmechanismsfforfdatafmanagementfincludingfdatafstoragefandfretri
eval,fandfqueryfandftransactionfprocessing.fTheflargefnumberfoffdatabasefsystemsfofferingfqueryfandftra
nsactionfprocessingfeventuallyfandfnaturallyfledftofthefneedfforfdatafanalysisfandfunderstanding.fHence,fd
atafminingfbeganfitsfdevelopmentfoutfoffthisfnecessity.
(d) Describef thef stepsf involvedf inf dataf miningf whenf viewedf asf af processf off knowledgef discovery.
Thef stepsf involvedf inf dataf miningf whenf viewedf asf af processf off knowledge f discoveryf aref asf follows:
• Datafcleaning,fafprocessfthatfremovesforftransformsfnoisefandfinconsistentfdata
• Dataf integration,f wheref multiplef dataf sourcesf mayf bef combined
3
, 4 CHAPTERf 1.f f INTRODUCTION
• Datafselection,fwherefdatafrelevantftofthefanalysisftaskfarefretrievedffromfthefdatabase
• Dataf transformation,f wheref dataf aref transformedf orf consolidatedf intof formsf appropriatef forfmi
ning
• Datafmining,fanfessentialfprocessfwherefintelligentfandfefficientfmethodsfarefappliedfinforderftofex
tractfpatterns
• Patternf evaluation,f af processf thatf identifiesf thef trulyf interestingf patterns f representingf knowl-
fedge fbased fonfsome finterestingness fmeasures
• Knowledgef presentation,f wheref visualizationf andf knowledgef representationf techniquesf aref usedftof
presentfthefminedfknowledgeftofthefuser
1.2. Presentfanfexamplefwherefdatafminingfisfcrucialftofthefsuccessfoffafbusiness.f Whatfdatafminingffunctionsfdoe
sfthisfbusinessfneed?f Canftheyfbefperformedfalternativelyfbyfdatafqueryfprocessingforfsimplefstatisticalfanalysis?
Answer:
Af departmentf store,f forf example,f canf usef dataf miningf tof assistf withf itsf targetf marketingf mailf campaign.fUsi
ngfdatafminingffunctions fsuchfasfassociation,fthefstorefcanfusefthefminedfstrongfassociationfrulesftofdeterminef wh
ichf productsf boughtf byf onef groupf off customersf aref likelyf tof leadf tof thef buyingf off certainfotherfproducts.f
Withfthisfinformation,fthefstorefcanfthenfmailfmarketingfmaterialsfonlyftofthosefkindsfoffcustomersf whof exhibitf a
f high f likelihood f off purchasing f additional f products.f Data f query f processing f isf used ffor fdataforfinformation fretrie
valfandfdoesfnotfhavefthe fmeansfforffindingfassociationfrules.f Similarly,fsimplefstatisticalfanalysisfcannotfhandlef
largefamountsfoffdatafsuchfasfthosefoffcustomerfrecords finfafdepartmentf store.
1.3. SupposefyourftaskfasfafsoftwarefengineerfatfBig-
Universityfisftofdesignfafdatafminingfsystemftofexamineftheirfuniversityfcoursefdatabase,fwhichfcontainsfthe
ffollowingfinformation: f thefname,faddress,fandfstatusf(e.g.,fundergraduateforfgraduate)foffeachfstudent,fthefc
oursesftaken,fandftheirfcumulativefgradefpointfaveragef(GPA).fDescribefthefarchitecturefyoufwouldfchoose.f Wh
atfisfthefpurposefoffeachfcomponentfoffthisfarchitecture?
Answer:
Af dataf miningf architecturef thatf canf bef usedf forf thisf applicationf wouldf consistf off thef followingf majorf com
ponents:
• Afdatabase,fdatafwarehouse,forfotherfinformationfrepository,fwhichfconsistsfoffthefsetfoffdataba
ses,fdatafwarehouses,fspreadsheets,forfotherfkindsfoffinformationfrepositoriesfcontainingfthefstudentfandfcou
rsefinformation.
• Afdatabaseforfdatafwarehousefserver,fwhichffetchesfthefrelevantfdatafbasedfonfthefusers’fdatafmining
frequests.
• Afknowledgefbasefthatfcontainsfthefdomainfknowledge fusedftofguidefthefsearchforftofevaluatefthefinterest
ingnessfoffresultingfpatterns.f Forfexample,fthefknowledge fbasefmayfcontainfconceptfhierarchiesfandf metad
ataf (e.g.,f describingf dataf fromf multiplef heterogeneous f sources).
• Afdatafminingfengine,fwhichfconsistsfoffafsetfofffunctionalfmodulesfforftasksfsuchfasfclassification,fasso
ciation,f classification,f clusterf analysis,f andf evolutionf andf deviationf analysis.
• Afpatternfevaluationfmodulefthatfworksfinftandemfwithfthefdatafminingfmodulesfbyfemployingfinterest
ingnessf measuresf tof helpf focusf thef searchf towardsf interestingf patterns.
• Afgraphicalfuserfinterfacefthatfprovidesfthefuserfwithfanfinteractivefapproachftofthefdatafminingfsyst
em.