Analysis of Categorical b b b
Data
14.1 AbDescriptionbofbthebExperiment
14.2 ThebChi-SquarebTest
14.3 AbTestbofbabHypothesisbConcerningbSpecifiedb CellbProbabilities:
bAbGoodness-of-FitbTest
14.4 ContingencybTables
14.5 rb×bcbTablesbwithbFixedbRowborbColumnbTotals
14.6 OtherbApplications
14.7 SummarybandbConcludingbRemarks
ReferencesbandbFurtherbReadings
b
14.1 A Description of the Experiment
b b b b
Manybexperimentsbresultbinbmeasurementsbthatbarebqualitativeborbcategoricalbratherb
thanbquantitiativeblikebmanybofbthebmeasurementsbdiscussedbinbpreviousbchapters.bIn
bthese binstances, babqualityborbcharacteristic bisbidentified bfor beachbexperimental bunit.b
Databassociatedbwithbsuchbmeasurementsbcanbbebsummarizedbbybprovidingbthebcount
bofbthebnumberbofbmeasurementsbthatbfallbintobeachbofbthebdistinctbcategoriesbassociated
bwithbthebvariable. bForbexample,
• Employeesbcanbbebclassifiedbintobonebofbfivebincomebbrackets.
• Micebmightbreactbinbonebofbthreebwaysbwhenbsubjectedbtobabstimulus.
• Motorbvehiclesbmightbfallbintobonebofbfourbvehiclebtypes.
• Paintingsb couldb beb classifiedb intob oneb ofb kb categoriesb accordingb tob styleb and
period.
• Thebqualitybofbsurgicalbincisionsbcouldbbeb mostb meaningfullybbebidentifiedbas
excellent,bverybgood,bgood,bfair,borbpoor.
• Manufacturedbitemsbarebacceptable,bseconds,borbrejects.
Allbthebprecedingbexamplesbexhibit,btobabreasonablebdegreebofbapproximation,bthebfo
llowingbcharacteristics,bwhichbdefinebabmultinomialbexperimentb(seebSectionb5.9):
713
,714 Chapterb14 AnalysisbofbCategoricalbData
1. Thebexperimentbconsistsbofbnbidenticalbtrials.
2. Theboutcomebofbeachbtrialbfallsbintobexactlybonebofbkb distinctbcategoriesborbcells.
3. Thebprobabilitybthatbtheboutcomebofbabsinglebtrialbwillbfallbinbabparticularbcell,b
cellbib,bisbpib,bwherebib b=1,b2 , . . . , bk,bandbremainsbthebsamebfrombtrialbtobtrial.b
Noticebthat
p1b+bp2b+bp3b+b· · · b+bpkb =b1.
4. Thebtrialsbarebindependent.
5. Webarebinterestedbinbn1,bn2,bn 3 , . . . , bnkb,bwherebnibforbib b =1,b2 , . . . , bkbisbequalb
tobtheb numberb ofb trialsb forb whichb theb outcomeb fallsb intob cellb ib.b Noticeb that
bn1b+bn2b+bn3b+b· · · b+bnkb =bn.
Thisbexperimentbisbanalogousbtobtossingbnbballsbatbkbboxes,bwherebeachbballbmustb
fallbintobexactlybonebofbthebboxes.bThebprobabilitybthatbabballbwillbfallbintobabboxbvaries
bfrombboxbtobboxbbutbremainsbthebsamebfor beachbboxbinbrepeatedbtosses.bFinally, btheb
ballsbarebtossedbinbsuchbabwaybthatbthebtrialsbarebindependent.bAtbthebconclusionbofbt
hebexperiment,bwebobservebn1bballsbinbthebfirstbbox,bn2binbthebsecond,b. . . b,bandbnkb inbt
hebkth.bThebtotalbnumberbofbballsbisbnb = b n1b b+
n2b b n+3 +b · · · b + nkb.
Noticebthebsimilaritybbetweenbthebbinomialbandbthebmultinomialbexperimentsband,
binbparticular,bthatbthebbinomialbexperiment brepresentsbthebspecialbcasebforbthebmulti-
bnomial bexperiment bwhenbkb=2.bThebtwo- = −
cellbprobabilities,bpbandbqb 1b p,bofbthebbinomialbexperimentbarebreplacedbbybthebk-
cellbprobabilities,b p1,bp 2 ,. . ., bpkb,bofbthebmultinomialbexperiment.bThebobjectivebofb
thisbchapterbisbtobmakebinferencesbaboutbthebcellbprobabilitiesb p1,bp 2 , .. . , bpkb.bThebi
nferencesbwillbbebexpressedbinbtermsbofbstatisticalbtestsbofbhypothesesbconcerningbth
ebspecificbnumericalbvaluesbofbthebcellbprobabilitesborbtheirbrelationshipbonebtobanoth
er.
Becausebthebcalculationbofbmultinomialbprobabilitiesbisbsomewhatbcumbersome,bi
tbwouldbbebdifficultbtobcalculatebthebexactbsignificanceblevelsb(probabilitiesbofbtypebIb
errors)bforbhypothesesbregardingbthebvaluesbofb p1,bp 2 ,. . ., bpkb.bFortunately,bwebhaveb
beenbrelievedbofbthisbchorebbybthebBritishbstatisticianbKarlbPearson,bwhobproposedbab
verybusefulbtestbstatisticbforbtestingbhypothesesbconcerningbp1,bp2,...,bpkbandbgavebtheb
approximatebsamplingbdistributionbofbthisbstatistic.bWebwillboutlinebthebconstruction
bofbPearson’sbtestbstatisticbinbthebfollowingbsection.
14.2 The Chi-Square Test
b b
Supposebthatbnb =b 100bballsbwerebtossedbatbthebcellsb(boxes)bandbthatbwebknewbthatbp1b
wasbequalbtob.1.bHowbmanybballsbwouldbbebexpectedbtobfallbintobcellb1?bReferringbtob
Sectionb5.9,brecallbthatbn1bhasbab(marginal)bbinomialbdistributionbwithbparametersbnb
andb p1,bandbthat
E(n1)b=bnp1b =b(100)(.1)b=b10.
Inblikebmanner,beachbofbthebnib’sbhavebbinomialbdistributionsbwithbparametersbnbandb pi
andbthebexpectedbnumbersbfallingbintobcellbib is
E(nib)b=b npib, ib =b1,b2 , . . . , bk.
, 14.2 ThebChi-SquarebTest 715
Nowbsupposebthatbwebhypothesizebvaluesbforb p1,bp 2 , .. . , bpkb andbcalculatebthebe
xpectedbvaluebforbeachbcell.bCertainlybifbourbhypothesisbisbtrue,bthebcellbcountsbnibsho
2 , . . . , bk.bHence,bitb
uldbnotbdeviatebgreatlybfrombtheirbexpectedbvaluesbnpibforbib b 1,b=
wouldbseembintuitivelybreasonablebtobusebabtestbstatisticbinvolvingbthebkbdeviations,
nib −bE(nib)b=bnib −bnpib, forbib=b1,b2,..., bk.
Inb1900bKarlbPearsonbproposedbthebfollowingbtestbstatistic,bwhichbisbabfunctionbofbthe
bsquaresbofbthebdeviationsbofbthebobservedbcountsbfrombtheirbexpectedbvalues,bweighte
dbbybthebreciprocalsbofbtheirbexpectedbvalues:
[nbi − E(nbi )]
k k
[nbi − npbi ]
2 2
X =
2
=
b E(nib) b npi
ib=1 ib=1
Althoughbthebmathematicalbproofbisbbeyondbthebscopebofbthisbtext,bitbcanbbebshownbt
hatbwhenbnbisblarge,bXb2bhasbanbapproximatebchi-
squareb(χb2)bprobabilitybdistribution.bWebcanbeasilybdemonstratebthisbresultbforbthebc
asebkb =b2,basbfollows.bIfbkb =b2,bthen
n2b =bnb−bn1b andb p1b+2 bp2b =b1.bThus,
2 2 2
Σb[nib−bE(nib)]b b (n1b−bnp1)bb (n2b−bnp2)b
Xb2b =b E(nib) =b np1 +b np2
ib=
1(n1
— np1) [(nb−bn1)b−bn(1b−bp1)]2
2
= +
np1 bn(1b−bp1)
(n1 — np1)2 (−n1b+bnp1)2
= +
np1 bn(1b−bp1)
b
=b (n1b−bnp1) .
2
=b(n1b −bnp1) 2 b 1 b
+b 1
npb n(1b−bpb ) npb (1b−bpb )b
1 1 1 1
Webhavebseenb(Sectionb7.5)bthatbforblargebn
bn1bb b np1
—
√ b
np1(1b−b p1)
hasb approximatelyb ab standardb normalb distribution.b Sinceb theb squareb ofb ab standard
normalbrandombvariablebhasbabχb2bdistributionb(seebExampleb6.11),bforbk = 2bandblarge
n,b Xb2bhasbanbapproximatebχb2bdistributionbwithb1bdegreebofbfreedomb(df).
Experiencebhasbshownbthatbthebcellbcountsbnib shouldbnotbbebtoobsmallbifbthebχb2b dist
ributionbisbtobprovidebanbadequatebapproximationbtobthebdistributionbofbXb2.bAsbabrulebo
fbthumb,bwebwillbrequirebthatballbexpectedbcellbcountsbarebatbleastbfive,balthoughbCochra
nb(1952)bhasbnotedbthatbthisbvaluebcanbbebasblowbasbonebforbsomebsituations.
Youbwillbrecallbthebusebofbthebχb2bprobabilitybdistributionbforbtestingbabhypothesisbco
ncerningbabpopulationbvariancebσb2binbSectionb10.9.bInbparticular,bwebhavebseenbthat
thebshapebofbthebχb2bdistributionbandbthebassociatedbquantilesbandbtailbareasbdifferbcon-
bsiderablybdependingbonbthe bnumber bofbdegreesbofbfreedomb(seebTableb6,bAppendixb3)
.bTherefore,bifbwebwantbtobusebXb2basbabtestbstatistic,bwebmustbknowbthebnumberbofbdegree
sbofbfreedombassociatedbwithbthebapproximatingbχb2bdistributionbandbwhetherbtobuseb
abone-tailedb orb two-
tailedb testb inb locatingb theb rejectionb regionb forb theb test.b Theb latter
, 716 Chapterb14 AnalysisbofbCategoricalbData
problembmaybbebsolvedbdirectly.bBecauseblargebdifferencesbbetweenbthebobservedband
bexpected bcellbcountsbcontradictbthebnullbhypothesis, bwebwillbrejectbthebnullbhypothesi
sbwhenbXb2bisblargebandbemploybanbupper-tailedbstatisticalbtest.
Thebdeterminationbofbthebappropriatebnumberbofbdegreesbofbfreedombtobbebemployed
bforbthebtestbcanbbebablittle btrickyband bthereforebwillbbebspecifiedbfor bthebphysical bapp
licationsbdescribedbinbthebfollowingbsections.bInbaddition,bwebwillbstatebthebprinciplebinv
olvedb(whichbisbfundamentalbtobthebmathematicalbproofbofbthebapproximation)b sobt
hatbyoubwillbunderstandbwhybthebnumberbofbdegreesbofbfreedombchangesbwithbvario
usbapplications.bThisbprinciplebstatesbthatbthebappropriatebnumberbofbdegreesbofbfreed
ombwillbequalbthebnumberbofbcells,bk,blessb1bdfbforbeachbindependentblinearbrestric-
btionbplacedbonbthebcellbprobabilities.bForbexample,boneblinearbrestrictionbisbalwaysbp
resentbbecausebthebsumbofbthebcellbprobabilitiesbmustbequalb1;bthatbis,
p1b+bp2b+bp3b+b· · · b+bpkb =b1.
Otherbrestrictionsbwillbbebintroducedbforbsomebapplicationsbbecausebofbthebnecessityb
forbestimatingbunknownbparametersbrequiredbinbthebcalculationbofbthebexpectedbcellb
frequenciesborbbecausebofbthebmethodbusedbtobcollectbthebsample.bWhenbunknownbp
arametersbmustbbebestimatedbinborderbtobcomputebXb2,babmaximum-likelihoodbesti-
bmatorb(MLE) bshould bbebemployed.bThebdegreesbofbfreedombforbthebapproximating bχb
2
bdistributionbisbreducedbbyb1 bforbeachbparameter bestimated. bThesebcasesbwillbariseba
sbwebconsiderbvariousbpracticalbexamples.
14.3 A Test of a Hypothesis Concerning S
b b b b b b
pecified Cell Probabilities:
b b
A Goodness-of-Fit Test
b b
Thebsimplestbhypothesisbconcerningbthebcellbprobabilitiesbisbonebthatbspecifiesbnumer-
bical bvaluesbfor beach. bInbthisbcase,bwebarebtestingbH0b:b p1
= p1,0,bp2 = =
p 2 , 0 ,..., bpkbpk,0,b whereb
pi,0b denotesb ab specifiedb valueb forb pib.b Theb alternativeb isb theb generalboneb thatb stat
esb thatb atb leastb oneb ofb theb equalitiesb doesb notb hold.b Becauseb theb only
Σk ib=1
restrictionb onb theb cell b probabilitiesb isb thatb pib =b 1,b theb Xb2b testb statisticb has
approximatelyb a χ 2bdistributionbwithbk — 1bdf.b
EXAMPLE b 14.1b b Abgroupbofbrats,bonebbybone,bproceedbdownbabrampbtobonebofbthreebdoors.bWebwishbtob
testbthebhypothesisbthatbthebratsbhavebnobpreferencebconcerningbthebchoicebofbabdoor.b
Thus,bthebappropriatebnullbhypothesisbis
1
H0b:b p1 =b p2 =b p3 =b ,
3b
whereb pib isbthebprobabilitybthatbabratbwillbchoosebdoorbib,bforbib =b1,b2,borb3.
Supposebthatbthebratsbwerebsentbdownbthebrampbnb =b90btimesbandbthatbthebthreebo
bservedbcellbfrequenciesbwerebn1b =b23,bn2b =b36,bandbn3b =b31.bThebexpectedbcellbfr
equencybarebthebsamebforbeachbcell:bE(nib)b=bnpib =b(90)(1/3)b=b30.bThebobserved