simpleRg –g Usingg Rg forg Introductoryg Statistics
Johng Verzani
8e+05
6e+05
y
4e+05
2e+05
20000g g 40000g g 60000g g 80000 120000 160000
, pagegi
Preface
ThesegnotesgaregangintroductiongtogusinggthegstatisticalgsoftwaregpackagegRgforgangintroductorygstatisticsgcourse
.gTheyg areg meantg tog accompanyg ang introductoryg statisticsg bookg suchg asg Kitchensg “Exploringg Statistics”.g Theg
goalsgaregnotgtogshowgallgthegfeaturesgofg R,gorgtogreplacegagstandardgtextbook,gbutgrathergtogbegusedgwithgagtextbo
okgtogillustrateg theg featuresg ofg Rgthatg cang beg learnedgingagone-semester,g introductorygstatisticsg course.
Theseg notesg wereg writteng tog takeg advantageg ofg Rg versiong 1.5.0g org later.g Forg pedagogicalgreasonsg theg equalsg sign,
=,g isg usedg asg ang assignmentg operatorg andg notg theg traditionalg arrowg combinationg <-
.g Thisg wasg addedg tog Rg ing version
1.4.0.g Ifg onlyg ang olderg versiong isg availableg theg readerg willg haveg tog makeg theg minorg adjustment.
Theregaregseveralgreferencesgtogdatagandgfunctionsgingthisgtextgthatgneedgtogbeginstalledgpriorgtogtheirguse.
g Toginstallgthegdatagisgeasy,gbutg theginstructionsgvarygdependinggongyourgsystem.g ForgWindows gusers,gyougne
edgtogdownloadgtheg“zip”gfileg,gandgthenginstallgfromgtheg“packages”gmenu.g IngUNIX,gonegusesgthegcommandgR
gCMDgINSTALLgpackagename.tar.gz.gSomegofgthegdatasetsgaregborrowedgfromgothergauthorsgnotably gKitchens.g C
reditgisggivengingtheghelpgfilesgforgtheg datasets.g ThisgmaterialgisgavailablegasgangRgpackagegfrom:
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple g 0.4.zipg forg Windowsg users.
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple g 0.4.tar.gz g forg UNIXg users.
Ifg necessary,g theg fileg cang sentg ing ang email.g Asg well,g theg individualgdatag setsg cang beg foundg onlineg ing theg directory
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple.
Thisgisgversiong0.4gofgthesegnotesgandgwereglastggeneratedgongAugustg22,g2002.gBeforegprintinggthesegnotes,gyo
ugshouldgcheckg forgtheg mostgrecentg versiong availablegfrom
theg g CSIg g Mathg g departmentg g g (http://www.math.csi.cuny.edu/Statistics/R/simpleR).
Copyrightg ◯
c Johng Verzanig (),g 2001-2.g Allg rightsg reserved.
Contents
Introduction 1
Whatg isg R .............................................................................................................................................................................. 1
Ag noteg ong notation .............................................................................................................................................................. 1
Data 1
Startingg R .............................................................................................................................................................................. 1
Enteringg datag withg c ........................................................................................................................................................... 2
Datag isg ag vector ................................................................................................................................................................... 3
Problems ................................................................................................................................................................................ 7
Univariateg Data 8
Categoricalg data.................................................................................................................................................................... 8
Numericalg data ................................................................................................................................................................... 10
Problems .............................................................................................................................................................................. 18
Bivariateg Data 19
Handlingg bivariateg categoricalg data................................................................................................................................. 20
Handlingg bivariateg data:g categoricalg vs.g numerical ..................................................................................................... 21
Bivariateg data:g numericalg vs.g numerical ....................................................................................................................... 22
Linearg regression................................................................................................................................................................ 24
Problems .............................................................................................................................................................................. 31
Multivariateg Data 32
Storingg multivariateg datag ing datag frames ...................................................................................................................... 32
Accessingg datag ing datag frames ........................................................................................................................................ 33
Manipulatingg datag frames:g stackg andg unstack ............................................................................................................. 34
Usingg R’sg modelg formulag notation ................................................................................................................................... 35
Waysg tog viewg multivariateg data ...................................................................................................................................... 35
Theg latticeg package ..................................................................................................................................................... 40
, pagegii
Problems .............................................................................................................................................................................. 40
Randomg Data 41
Randomg numberg generatorsg ing R–g theg “r”g functions................................................................................................... 41
Problems .............................................................................................................................................................................. 46
Simulations 47
Theg centralg limitg theorem ................................................................................................................................................ 47
Usingg simple.simg andg functions .................................................................................................................................... 49
Problems .............................................................................................................................................................................. 51
Exploratoryg Datag Analysis 54
Ourg toolbox ......................................................................................................................................................................... 54
Examples .............................................................................................................................................................................. 54
Problems .............................................................................................................................................................................. 58
Confidenceg Intervalg Estimation 59
Populationg proportiong theory .......................................................................................................................................... 59
Proportiong test ................................................................................................................................................................... 61
Theg z-test ............................................................................................................................................................................ 62
Theg t-test............................................................................................................................................................................. 62
Confidenceg intervalg forg theg median................................................................................................................................. 64
Problems .............................................................................................................................................................................. 65
Hypothesisg Testing 66
Testingg ag populationg parameter ...................................................................................................................................... 66
Testingg ag mean................................................................................................................................................................... 67
Testsg forg theg median ......................................................................................................................................................... 67
Problems .............................................................................................................................................................................. 68
Two-sampleg tests 68
Two-samplegtestsg ofg proportion....................................................................................................................................... 68
Two-samplegt-tests ............................................................................................................................................................. 69
Resistantgtwo-samplegtests ............................................................................................................................................... 71
Problems .............................................................................................................................................................................. 71
Chig Squareg Tests 72
Theg chi-squaredg distribution ............................................................................................................................................ 72
Chi-squaredg goodnessg ofg fitg tests .................................................................................................................................... 72
Chi-squaredg testsg ofg independence ................................................................................................................................. 74
Chi-squaredg testsg forg homogeneity.................................................................................................................................. 75
Problems .............................................................................................................................................................................. 76
Regressiong Analysis 77
Simpleglinearg regressiong model ........................................................................................................................................ 77
Testingg theg assumptionsg ofg theg model ........................................................................................................................... 78
Statisticalg inference ........................................................................................................................................................... 79
Problems .............................................................................................................................................................................. 83
Multipleg Linearg Regression 84
Theg model ........................................................................................................................................................................... 84
Problems .............................................................................................................................................................................. 89
Analysisg ofg Variance 89
one-wayg analysisgofgvariance ............................................................................................................................................ 89
Problems .............................................................................................................................................................................. 92
Appendix:g Installingg R 94
Appendix:g Externalg Packages 94
Appendix:g Ag sampleg Rg session 94
Ag sampleg sessiong involvinggregression ........................................................................................................................... 94
, pagegiii
t-tests ................................................................................................................................................................................... 97
Ag simulationg example........................................................................................................................................................ 99
Appendix:g Whatg happensg wheng Rg starts? 100
Appendix:g Usingg Functions 100
Theg basicg template .......................................................................................................................................................... 100
Forg loops ........................................................................................................................................................................... 102
Conditionalg expressions .................................................................................................................................................. 103
Appendix:g g Enteringg Datag intog R 103
Usingg c............................................................................................................................................................................... 104
usingg scan .......................................................................................................................................................................... 104
Usingg scang withg ag file .................................................................................................................................................... 104
Editingg yourg data ............................................................................................................................................................. 104
Readingg ing tablesg ofg data ............................................................................................................................................... 105
Fixed-widthg fields ............................................................................................................................................................ 105
Spreadsheetg data............................................................................................................................................................... 105
XML, g urls ......................................................................................................................................................................... 106
“Foreign”g formats............................................................................................................................................................. 106
Appendix:g Teachingg Tricks 106
Appendix:g Sourcesg ofg help,g documentation 107
Johng Verzani
8e+05
6e+05
y
4e+05
2e+05
20000g g 40000g g 60000g g 80000 120000 160000
, pagegi
Preface
ThesegnotesgaregangintroductiongtogusinggthegstatisticalgsoftwaregpackagegRgforgangintroductorygstatisticsgcourse
.gTheyg areg meantg tog accompanyg ang introductoryg statisticsg bookg suchg asg Kitchensg “Exploringg Statistics”.g Theg
goalsgaregnotgtogshowgallgthegfeaturesgofg R,gorgtogreplacegagstandardgtextbook,gbutgrathergtogbegusedgwithgagtextbo
okgtogillustrateg theg featuresg ofg Rgthatg cang beg learnedgingagone-semester,g introductorygstatisticsg course.
Theseg notesg wereg writteng tog takeg advantageg ofg Rg versiong 1.5.0g org later.g Forg pedagogicalgreasonsg theg equalsg sign,
=,g isg usedg asg ang assignmentg operatorg andg notg theg traditionalg arrowg combinationg <-
.g Thisg wasg addedg tog Rg ing version
1.4.0.g Ifg onlyg ang olderg versiong isg availableg theg readerg willg haveg tog makeg theg minorg adjustment.
Theregaregseveralgreferencesgtogdatagandgfunctionsgingthisgtextgthatgneedgtogbeginstalledgpriorgtogtheirguse.
g Toginstallgthegdatagisgeasy,gbutg theginstructionsgvarygdependinggongyourgsystem.g ForgWindows gusers,gyougne
edgtogdownloadgtheg“zip”gfileg,gandgthenginstallgfromgtheg“packages”gmenu.g IngUNIX,gonegusesgthegcommandgR
gCMDgINSTALLgpackagename.tar.gz.gSomegofgthegdatasetsgaregborrowedgfromgothergauthorsgnotably gKitchens.g C
reditgisggivengingtheghelpgfilesgforgtheg datasets.g ThisgmaterialgisgavailablegasgangRgpackagegfrom:
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple g 0.4.zipg forg Windowsg users.
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple g 0.4.tar.gz g forg UNIXg users.
Ifg necessary,g theg fileg cang sentg ing ang email.g Asg well,g theg individualgdatag setsg cang beg foundg onlineg ing theg directory
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple.
Thisgisgversiong0.4gofgthesegnotesgandgwereglastggeneratedgongAugustg22,g2002.gBeforegprintinggthesegnotes,gyo
ugshouldgcheckg forgtheg mostgrecentg versiong availablegfrom
theg g CSIg g Mathg g departmentg g g (http://www.math.csi.cuny.edu/Statistics/R/simpleR).
Copyrightg ◯
c Johng Verzanig (),g 2001-2.g Allg rightsg reserved.
Contents
Introduction 1
Whatg isg R .............................................................................................................................................................................. 1
Ag noteg ong notation .............................................................................................................................................................. 1
Data 1
Startingg R .............................................................................................................................................................................. 1
Enteringg datag withg c ........................................................................................................................................................... 2
Datag isg ag vector ................................................................................................................................................................... 3
Problems ................................................................................................................................................................................ 7
Univariateg Data 8
Categoricalg data.................................................................................................................................................................... 8
Numericalg data ................................................................................................................................................................... 10
Problems .............................................................................................................................................................................. 18
Bivariateg Data 19
Handlingg bivariateg categoricalg data................................................................................................................................. 20
Handlingg bivariateg data:g categoricalg vs.g numerical ..................................................................................................... 21
Bivariateg data:g numericalg vs.g numerical ....................................................................................................................... 22
Linearg regression................................................................................................................................................................ 24
Problems .............................................................................................................................................................................. 31
Multivariateg Data 32
Storingg multivariateg datag ing datag frames ...................................................................................................................... 32
Accessingg datag ing datag frames ........................................................................................................................................ 33
Manipulatingg datag frames:g stackg andg unstack ............................................................................................................. 34
Usingg R’sg modelg formulag notation ................................................................................................................................... 35
Waysg tog viewg multivariateg data ...................................................................................................................................... 35
Theg latticeg package ..................................................................................................................................................... 40
, pagegii
Problems .............................................................................................................................................................................. 40
Randomg Data 41
Randomg numberg generatorsg ing R–g theg “r”g functions................................................................................................... 41
Problems .............................................................................................................................................................................. 46
Simulations 47
Theg centralg limitg theorem ................................................................................................................................................ 47
Usingg simple.simg andg functions .................................................................................................................................... 49
Problems .............................................................................................................................................................................. 51
Exploratoryg Datag Analysis 54
Ourg toolbox ......................................................................................................................................................................... 54
Examples .............................................................................................................................................................................. 54
Problems .............................................................................................................................................................................. 58
Confidenceg Intervalg Estimation 59
Populationg proportiong theory .......................................................................................................................................... 59
Proportiong test ................................................................................................................................................................... 61
Theg z-test ............................................................................................................................................................................ 62
Theg t-test............................................................................................................................................................................. 62
Confidenceg intervalg forg theg median................................................................................................................................. 64
Problems .............................................................................................................................................................................. 65
Hypothesisg Testing 66
Testingg ag populationg parameter ...................................................................................................................................... 66
Testingg ag mean................................................................................................................................................................... 67
Testsg forg theg median ......................................................................................................................................................... 67
Problems .............................................................................................................................................................................. 68
Two-sampleg tests 68
Two-samplegtestsg ofg proportion....................................................................................................................................... 68
Two-samplegt-tests ............................................................................................................................................................. 69
Resistantgtwo-samplegtests ............................................................................................................................................... 71
Problems .............................................................................................................................................................................. 71
Chig Squareg Tests 72
Theg chi-squaredg distribution ............................................................................................................................................ 72
Chi-squaredg goodnessg ofg fitg tests .................................................................................................................................... 72
Chi-squaredg testsg ofg independence ................................................................................................................................. 74
Chi-squaredg testsg forg homogeneity.................................................................................................................................. 75
Problems .............................................................................................................................................................................. 76
Regressiong Analysis 77
Simpleglinearg regressiong model ........................................................................................................................................ 77
Testingg theg assumptionsg ofg theg model ........................................................................................................................... 78
Statisticalg inference ........................................................................................................................................................... 79
Problems .............................................................................................................................................................................. 83
Multipleg Linearg Regression 84
Theg model ........................................................................................................................................................................... 84
Problems .............................................................................................................................................................................. 89
Analysisg ofg Variance 89
one-wayg analysisgofgvariance ............................................................................................................................................ 89
Problems .............................................................................................................................................................................. 92
Appendix:g Installingg R 94
Appendix:g Externalg Packages 94
Appendix:g Ag sampleg Rg session 94
Ag sampleg sessiong involvinggregression ........................................................................................................................... 94
, pagegiii
t-tests ................................................................................................................................................................................... 97
Ag simulationg example........................................................................................................................................................ 99
Appendix:g Whatg happensg wheng Rg starts? 100
Appendix:g Usingg Functions 100
Theg basicg template .......................................................................................................................................................... 100
Forg loops ........................................................................................................................................................................... 102
Conditionalg expressions .................................................................................................................................................. 103
Appendix:g g Enteringg Datag intog R 103
Usingg c............................................................................................................................................................................... 104
usingg scan .......................................................................................................................................................................... 104
Usingg scang withg ag file .................................................................................................................................................... 104
Editingg yourg data ............................................................................................................................................................. 104
Readingg ing tablesg ofg data ............................................................................................................................................... 105
Fixed-widthg fields ............................................................................................................................................................ 105
Spreadsheetg data............................................................................................................................................................... 105
XML, g urls ......................................................................................................................................................................... 106
“Foreign”g formats............................................................................................................................................................. 106
Appendix:g Teachingg Tricks 106
Appendix:g Sourcesg ofg help,g documentation 107