1
John Hopcroft Ravi
b b
ndran Kannan
b
Versionb 11/4/2014
Theseb notesb areb ab firstb draftb ofb ab bookb beingb writtenb byb Hopcroftb andb Kannanb andb i
nbmanybplacesbarebincomplete.b However,bthebnotesbarebinbgoodbenoughbshapebtobprepareblect
uresb forb ab modernb theoreticalb courseb inb computerb science.b Pleaseb dob notb putb solutionsbtobe
xercisesbonlinebasbitbisbimportantbforbstudentsbtobworkboutbsolutionsbforbthemselvesb ratherbth
anbcopybthembfrombthebinternet.
ThanksbJE
H
1
Copyrightb 2011.b Allb rightsb reserved
1
,Contents
1 Introduction 7
2 High-Dimensionalb Space 10
2.1 Propertiesb ofb High-Dimensionalb Space ....................................................................12
2.2 Theb Lawb ofb Largeb Numbers ...................................................................................... 13
2.3 Theb High-Dimensionalb Sphere .................................................................................15
2.3.1 Theb Sphereb andb theb Cubeb inb Highb Dimensions ..........................................16
2.3.2 Volumeb andb Surfaceb Areab ofb theb Unitb Sphere ............................................17
2.3.3 Theb Volumeb isb Nearb theb Equator .................................................................20
2.3.4 Theb Volumeb isb inb ab Narrowb Annulus ...........................................................23
2.3.5 Theb Surfaceb Areab isb Nearb theb Equator ........................................................ 24
2.4 Volumesb ofb Otherb Solids............................................................................................26
2.5 Generatingb Pointsb Uniformlyb atb Randomb onb theb Surfaceb ofb ab Sphere .................27
2.6 Gaussiansb inb Highb Dimension ................................................................................... 27
2.7 Boundsb onb Tailb Probability........................................................................................ 33
2.8 Applicationsb ofb theb tailb bound .................................................................................35
2.9 Randomb Projectionb andb Johnson-Lindenstraussb Theorem ....................................38
2.10 Bibliographicb Notes...................................................................................................41
2.11 Exercises ..................................................................................................................... 42
3 Best-FitbSubspacesbandbSingularbValuebDecompositionb(SVD) 52
3.1 Singularb Vectors.........................................................................................................53
3.2 Singularb Valueb Decompositionb (SVD) .......................................................................56
3.3 Bestb Rankb kb Approximations ....................................................................................58
3.4 Leftb Singularb Vectors ................................................................................................ 60
3.5 Powerb Methodb forb Computingb theb Singularb Valueb Decomposition ....................... 62
3.6 Applicationsb ofb Singularb Valueb Decomposition .......................................................64
3.6.1 Principalb Componentb Analysis .....................................................................64
3.6.2 Clusteringb ab Mixtureb ofb Sphericalb Gaussians..............................................65
3.6.3 Spectralb Decomposition ................................................................................71
3.6.4 Singularb Vectorsb andb Rankingb Documents .................................................71
3.6.5 Anb Applicationb ofb SVDb tob ab Discreteb Optimizationb Problem ..................72
3.7 Singularb Vectorsb andb Eigenvectors ...........................................................................75
3.8 Bibliographicb Notes...................................................................................................76
3.9 Exercises ..................................................................................................................... 77
4 Randomb Graphs 85
4.1 Theb G(n,bp)b Model .................................................................................................. 85
4.1.1 Degreeb Distribution........................................................................................ 86
4.1.2 Existenceb ofb Trianglesb inb G(n,bd/n) ............................................................. 91
4.2 Phaseb Transitions .......................................................................................................93
4.3 Theb Giantb Component ............................................................................................ 101
2
, 4.4 Branchingb Processes ................................................................................................ 109
4.5 Cyclesb andb Fullb Connectivity................................................................................... 115
4.5.1 Emergenceb ofb Cycles ................................................................................... 115
4.5.2 Fullb Connectivity .......................................................................................... 116
4.5.3 Thresholdb forb O(lnbn)b Diameter ................................................................ 118
4.6 Phaseb Transitionsb forb Increasingb Properties.......................................................... 119
4.7 Phaseb Transitionsb forb CNF-sat ............................................................................... 121
4.8 Nonuniformb andb Growthb Modelsb ofb Randomb Graphs .......................................... 126
4.8.1 Nonuniformb Models .................................................................................... 126
4.8.2 GiantbComponentbinbRandombGraphsbwithbGivenbDegreebDistribution127
4.9 Growthb Models ........................................................................................................ 128
4.9.1 Growthb Modelb Withoutb Preferentialb Attachment .................................... 128
4.9.2 Growthb Modelb Withb Preferentialb Attachment.......................................... 135
4.10 Smallb Worldb Graphs ................................................................................................ 136
4.11 Bibliographicb Notes.................................................................................................141
4.12 Exercises ................................................................................................................... 142
5 Randomb Walksb andb Markovb Chains 153
5.1 Stationaryb Distribution ............................................................................................ 156
5.2 Electricalb Networksb andb Randomb Walks ............................................................... 158
5.3 Randomb Walksb onb Undirectedb Graphsb withb Unitb Edgeb Weights........................ 162
5.4 Randomb Walksb inb Euclideanb Space ....................................................................... 169
5.5 Theb Webb asb ab Markovb Chain .................................................................................. 173
5.6 Markovb Chainb Monteb Carlo .................................................................................... 177
5.6.1 Metropolis-Hastingb Algorithm .................................................................... 178
5.6.2 Gibbsb Sampling............................................................................................ 180
5.7 Areasb andb Volumes.................................................................................................. 182
5.8 Convergenceb ofb Randomb Walksb onb Undirectedb Graphs ....................................... 183
5.8.1 Usingb Normalizedb Conductanceb tob Proveb Convergence .......................... 188
5.9 Bibliographicb Notes.................................................................................................191
5.10 Exercises ................................................................................................................... 192
6 Learningb andb VC-dimension 202
6.1 Learning .................................................................................................................... 202
6.2 Linearb Separators,b theb Perceptronb Algorithm,b andb Margins ............................... 204
6.3 Nonlinearb Separators,b Supportb Vectorb Machines,b andb Kernels ........................... 209
6.4 Strongb andb Weakb Learningb -b Boosting................................................................... 214
6.5 Numberb ofb Examplesb Neededb forb Prediction:b VC-Dimension .............................. 216
6.6 Vapnik-Chervonenkisb orb VC-Dimension................................................................. 219
6.6.1 Examplesb ofb Setb Systemsb andb Theirb VC-Dimension ................................ 220
6.6.2 Theb Shatterb Function .................................................................................. 223
6.6.3 Shatterb Functionb forb Setb Systemsb ofb Boundedb VC-Dimension ................ 224
6.6.4 Intersectionb Systems ................................................................................... 226
3
, 6.7 Theb VCb Theorem ................................................................................................... 226
6.8 Simpleb Learning .......................................................................................................229
6.9 Bibliographicb Notes ................................................................................................ 230
6.10 Exercises ................................................................................................................... 231
7 Algorithmsb forb Massiveb Datab Problems 238
7.1 Frequencyb Momentsb ofb Datab Streams....................................................................238
7.1.1 Numberb ofb Distinctb Elementsb inb ab Datab Stream ..................................... 239
7.1.2 Countingb theb Numberb ofb Occurrencesb ofb ab Givenb Element. .................... 243
7.1.3 CountingbFrequentbElements ..................................................................... 243
7.1.4 Theb Secondb Moment .................................................................................. 245
7.2 Matrixb Algorithmsb Usingb Sampling .......................................................................249
7.2.1 Matrixb Multiplicationb Usingb Sampling ..................................................... 249
7.2.2 Sketchb ofb ab Largeb Matrix............................................................................251
7.3 Sketchesb ofb Documents............................................................................................ 254
7.4 Exercises ................................................................................................................... 257
8 Clustering 261
8.1 Someb Clusteringb Examples ..................................................................................... 261
8.2 Ab k-meansb Clusteringb Algorithm ...........................................................................264
8.3 Ab Greedyb Algorithmb forb k-Centerb Criterionb Clustering ...................................... 266
8.4 Spectralb Clustering .................................................................................................. 267
8.5 Recursiveb Clusteringb Basedb onb Sparseb Cuts ......................................................... 273
8.6 Kernelb Methods ....................................................................................................... 275
8.7 Agglomerativeb Clustering ....................................................................................... 276
8.8 Denseb Submatricesb andb Communities ....................................................................279
8.9 Flowb Methods ..........................................................................................................282
8.10 Findingb ab Localb Clusterb Withoutb Examiningb theb Wholeb Graph ........................ 284
8.11 Axiomsb forb Clustering ............................................................................................. 290
8.11.1 Anb Impossibilityb Result .............................................................................. 290
8.11.2 Ab Satisfiableb Setb ofb Axioms .......................................................................296
8.12 Exercises ................................................................................................................... 298
9 Topicb Models,b Hiddenb Markovb Process,b Graphicalb Models,b andb Beliefb Pro
pagation 302
9.1 Topicb Models ...........................................................................................................302
9.2 Hiddenb Markovb Model ............................................................................................ 306
9.3 Graphicalb Models,b andb Beliefb Propagation ........................................................... 311
9.4 Bayesianb orb Beliefb Networks ................................................................................... 312
9.5 Markovb Randomb Fields ........................................................................................... 313
9.6 Factorb Graphs ..........................................................................................................314
9.7 TreebAlgorithms .......................................................................................................315
9.8 Messageb Passingb inb generalb Graphs........................................................................316
9.9 Graphsb withb ab Singleb Cycle .................................................................................... 318
4