b b b b b b
Johnb Verzani
8e+05
6e+05
y
4e+05
2e+05
20000b b 40000b b 60000b b 80000 120000 160000
, pagebi
Preface
ThesebnotesbarebanbintroductionbtobusingbthebstatisticalbsoftwarebpackagebRbforbanbintroductorybstatisticsbcourse.bT
heyb areb meantb tob accompanyb anb introductoryb statisticsb bookb suchb asb Kitchensb “Exploringb Statistics”.b Theb goalsb
arebnotbtobshowballbthebfeaturesbofb R,borbtobreplacebabstandardbtextbook,bbutbratherbtobbebusedbwithbabtextbookbtobillus
trateb theb featuresb ofb Rbthatb canb beb learnedbinbabone-semester,b introductorybstatisticsb course.
Theseb notesb wereb writtenb tob takeb advantageb ofb Rb versionb 1.5.0b orb later.b Forb pedagogicalbreasonsb theb equalsb sign,
=,b isb usedb asb anb assignmentb operatorb andb notb theb traditionalb arrowb combinationb <-.b Thisb wasb addedb tob Rb inb version
1.4.0.b Ifb onlyb anb olderb versionb isb availableb theb readerb willb haveb tob makeb theb minorb adjustment.
Therebarebseveralbreferencesbtobdatabandbfunctionsbinbthisbtextbthatbneedbtobbebinstalledbpriorbtobtheirbuse.b T
obinstallbthebdatabisbeasy,bbutbthebinstructionsbvarybdependingbonbyourbsystem.b ForbWindowsbusers,byoubneedbto
bdownloadbtheb“zip”bfileb,bandbthenbinstallbfrombtheb“packages”b menu.b InbUNIX,bonebusesbthebcommandbRbCMDbIN
STALLbpackagename.tar.gz.bSomebofbthebdatasetsbarebborrowedbfrombotherbauthorsbnotablybKitchens.b Creditbisbgi
venbinbthebhelpbfilesbforbtheb datasets.b ThisbmaterialbisbavailablebasbanbRbpackagebfrom:
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simpleb 0.4.zipb forb Windowsb users.
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple b 0.4.tar.gzb forb UNIX b users.
Ifb necessary,b theb fileb canb sentb inb anb email.b Asb well,b theb individualbdatab setsb canb beb foundb onlineb inb theb directory
http://www.math.csi.cuny.edu/Statistics/R/simpleR/Simple.
Thisbisbversionb0.4bofbthesebnotesbandbwereblastbgeneratedbonbAugustb22,b2002.bBeforebprintingbthesebnotes,byoubs
houldbcheckb forbtheb mostbrecentb versionb availablebfrom
theb b CSIb b Mathb b departmentb b b (http://www.math.csi.cuny.edu/Statistics/R/simpleR).
Copyrightb ◯
c Johnb Verzanib (),b 2001-2.b Allb rightsb reserved.
Contents
Introduction 1
Whatb isb R............................................................................................................................................................................... 1
Ab noteb onb notation ............................................................................................................................................................... 1
Data 1
Startingb R .............................................................................................................................................................................. 1
Enteringb datab withb c ............................................................................................................................................................ 2
Datab isb ab vector .................................................................................................................................................................... 3
Problems ................................................................................................................................................................................ 7
Univariateb Data 8
Categoricalb data .................................................................................................................................................................... 8
Numericalb data .................................................................................................................................................................... 10
Problems .............................................................................................................................................................................. 18
Bivariateb Data 19
Handlingb bivariateb categoricalb data.................................................................................................................................. 20
Handlingb bivariateb data:b categoricalb vs.b numerical ...................................................................................................... 21
Bivariateb data:b numericalb vs.b numerical......................................................................................................................... 22
Linearb regression ................................................................................................................................................................ 24
Problems .............................................................................................................................................................................. 31
Multivariateb Data 32
Storingb multivariateb datab inb datab frames ....................................................................................................................... 32
Accessingb datab inb datab frames ......................................................................................................................................... 33
Manipulatingb datab frames:b stackb andb unstack .............................................................................................................. 34
Usingb R’sb modelb formulab notation .................................................................................................................................... 35
Waysb tob viewb multivariateb data........................................................................................................................................ 35
Theb latticeb package ...................................................................................................................................................... 40
Problems .............................................................................................................................................................................. 40
, pagebii
Randomb Data 41
Randomb numberb generatorsb inb R–b theb “r”b functions..................................................................................................... 41
Problems .............................................................................................................................................................................. 46
Simulations 47
Theb centralb limitb theorem ................................................................................................................................................. 47
Usingb simple.simb andb functions ..................................................................................................................................... 49
Problems .............................................................................................................................................................................. 51
Exploratoryb Datab Analysis 54
Ourb toolbox ......................................................................................................................................................................... 54
Examples .............................................................................................................................................................................. 54
Problems .............................................................................................................................................................................. 58
Confidenceb Intervalb Estimation 59
Populationb proportionb theory ........................................................................................................................................... 59
Proportionb test .................................................................................................................................................................... 61
Theb z-test ............................................................................................................................................................................ 62
Theb t-test ............................................................................................................................................................................. 62
Confidenceb intervalb forb theb median .................................................................................................................................. 64
Problems .............................................................................................................................................................................. 65
Hypothesisb Testing 66
Testingb ab populationb parameter ....................................................................................................................................... 66
Testingb ab mean ................................................................................................................................................................... 67
Testsb forb theb median .......................................................................................................................................................... 67
Problems .............................................................................................................................................................................. 68
Two-sampleb tests 68
Two-samplebtestsb ofb proportion ....................................................................................................................................... 68
Two-samplebt-tests ............................................................................................................................................................. 69
Resistantbtwo-samplebtests ................................................................................................................................................ 71
Problems .............................................................................................................................................................................. 71
Chib Squareb Tests 72
Theb chi-squaredb distribution ............................................................................................................................................. 72
Chi-squaredb goodnessb ofb fitb tests ..................................................................................................................................... 72
Chi-squaredb testsb ofb independence .................................................................................................................................. 74
Chi-squaredb testsb forb homogeneity .................................................................................................................................. 75
Problems .............................................................................................................................................................................. 76
Regressionb Analysis 77
Simpleblinearb regressionb model ......................................................................................................................................... 77
Testingb theb assumptionsb ofb theb model............................................................................................................................. 78
Statisticalb inference............................................................................................................................................................ 79
Problems .............................................................................................................................................................................. 83
Multipleb Linearb Regression 84
Theb model ........................................................................................................................................................................... 84
Problems .............................................................................................................................................................................. 89
Analysisb ofb Variance 89
one-wayb analysisbofbvariance ............................................................................................................................................. 89
Problems .............................................................................................................................................................................. 92
Appendix:b Installingb R 94
Appendix:b Externalb Packages 94
Appendix:b Ab sampleb Rb session 94
Ab sampleb sessionb involvingbregression ............................................................................................................................ 94
t-tests ................................................................................................................................................................................... 97
Ab simulationb example ........................................................................................................................................................ 99
, pagebiii
Appendix:b Whatb happensb whenb Rb starts? 100
Appendix:b Usingb Functions 100
Theb basicb template ........................................................................................................................................................... 100
Forb loops............................................................................................................................................................................ 102
Conditionalb expressions ................................................................................................................................................... 103
Appendix:b b Enteringb Datab intob R 103
Usingb c ............................................................................................................................................................................... 104
usingb scan........................................................................................................................................................................... 104
Usingb scanb withb ab file ..................................................................................................................................................... 104
Editingb yourb data ............................................................................................................................................................. 104
Readingb inb tablesb ofb data ................................................................................................................................................ 105
Fixed-widthb fields............................................................................................................................................................. 105
Spreadsheetb data ............................................................................................................................................................... 105
XML, b urls .......................................................................................................................................................................... 106
“Foreign”b formats ............................................................................................................................................................. 106
Appendix:b Teachingb Tricks 106
Appendix:b Sourcesb ofb help,b documentation 107