1.1 INTRODUCTION
BASIC CONCEPTS
● Statistical Inquiry: designed research that provides information needed to solve a research
problem
○ “Info”: processed data that has to be connected to other data rin
● Population, Elements, Sample:
○ Population: collection of all elements under consideration in a statistical inquiry
■ Totality + not necessarily tao/population count
■ Whenever we identify the population, we ALWAYS use the words “collection of”,
“set of”, or any similar terms [otherwise, we are just referring to the elements]
○ Elements: the units whose characteristics will be observed and measured by the
researchers in order to answer the research problem
■ Can be individuals, objects, animals, geographic areas etc.
○ Sample: a subset of the population that we actually examine in order to gather information;
collection din like population
● Variable: properties, characteristics, attributes of a physical or abstract system (e.g. person,
object, event, time period)
○ Can take on different values/amounts (determined by the measurement process)
○ Take measurements of elements = properties sila ng elements
■ Di kinukuha sa population/sample, but sa individual element
○ Qualitative vs Quantitative:
■ Qualitative: yield categorical or descriptive response
■ Quantitative: take on numerical values to represent amount or quantity
■ Remarks on quanti and quali variables:
● Some variables are still quali even if numerical in nature e.g. date
● If it has a unit, it’s often quanti e.g. hrs, km/h
○ Add or subtract the values → if interpretable, then it has a fixed unit =
quantitative
○ Discrete vs Continuous: usually for quantitative/numerical variables
■ Discrete: can assume finite, or at least countably infinite # of values
● Can be a result of classifying or counting bc NUMERIC
■ Continuous: can assume infinitely many values that can be stated using an
interval, fractions or decimals (e.g temperature, speed)
● Observation and Data:
○ Observation: a realized value of a variable for a particular element
○ Data: collection of observations of a variable for the elements of a sample
● Summary Measure: single numeric figure - describes a particular feature of the whole collection
○ Describes the collection of elements (vs variables na elements lang)
● Parameter and Statistic (summary measures):
○ Parameter: describes a specific characteristic of the population = P or POI computed using
population data ONLY
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑝𝑜𝑠𝑠𝑒𝑠𝑠𝑖𝑛𝑔 𝑎 𝑐𝑒𝑟𝑡𝑎𝑖𝑛 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐
■ 𝑃 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑖𝑜𝑛 (𝑃𝑂𝑃𝑈𝐿𝐴𝑇𝐼𝑂𝑁)
→ PROPORTION
○ Statistic: describes a specific characteristic of the sample = computed using sample data
● Census vs Sampling:
○ Census: complete enumeration; measure the VOI/s from all the elements of the population
○ Sampling: measure VOI/s from elements belonging in a sample
,FIELDS OF STATISTICS (APPLIED: DESCRIPTIVE AND INFERENTIAL)
● Descriptive: describe the collected data at hand (not necessarily a sample)
○ = so conclusions are only about the data at hand
○ Through: graphical measures, tables, summary measures
● Inferential: make predictions or inference to generalize for a larger set of data
○ Usually involves use of a statistic to estimate a parameter:
■ Point and interval estimation, hypothesis testing, regression analysis
○ Conclusions made under conditions of uncertainty bc we only use partial info [“use
statistic to estimate…”]; conclusions are subject to some error
■ = we need probability theory to know the possible errors
○ Ex.: election polls
1.2 LEVELS OF MEASUREMENT
● Measurement: process of determining the value/label of the variable based on what has been
observed
○ “Label” bc this extends to non-numerical values (e.g. civil status, educ attainment based on
intl or local scales)
● Measurement level (of a data): determines arithmetic and statistical procedures that can be
applied on them
○ Nominal < Ordinal < Interval < Rational
○ Need to know to help us in interpreting the value that the variable takes on
○ To help us choose appropriate statistical tool to analyze data
○ Properties:
■ The numbers in the measurement system are used to classify an element into
distinct categories (which are non-overlapping and exhaustive).
● 2 observations w/ the same value must belong in same category; if different =
shouldn’t be same
● All should belong to 1 category
■ The system arranges the categories according to magnitude.
● Smaller assigned number = less of the traits/characteristics
■ Has a fixed unit of measurement representing a set size throughout the scale.
● 1-unit difference = same interpretation wherever it happens in the scale
■ The system has an absolute zero.
● Complete absence of the characteristic
■ Admissible operations per property:
Property Interpretable relationships
Admissible operations
or operations
1st Equality Group together observations in the data w/
same values + count how many belong in
same category
2nd Greater than or less than Arrange observations in data accdg to
magnitude
3rd Difference or sum Sum up all or get difference between 2
observations
4th Ratio Compute for ratio of 2 observations
,RATIO LEVEL (STRONGEST)
● Has all 4 properties:
○ The numbers in the measurement system are used to classify an element into distinct
categories. These categories are non-overlapping and exhaustive.
○ The system arranges the categories according to magnitude.
○ The system has a fixed unit of measurement representing a set size throughout the scale.
○ The system has an absolute zero
● Has all 4 properties = strongest = can do any arithmetic operation
INTERVAL LEVEL
● Only 3 properties: no true 0 = ratios of measurements taken using the scale aren’t interpretable
○ You can't say "20°C is twice as hot as 10°C" because 0°C doesn’t mean "no heat." The
ratios aren't meaningful.
● Example 1: temp in Kelvin
○ 1st: 100 ℃ is not the same as 50 ℃, but if two objects are both 100 ℃, then we say that
they have the same temperature
○ 2nd: 100 ℃ is warmer than 50 ℃
○ 3rd: When you measure the temperature of water in centigrade, the distance between 92 ℃
and 94 ℃, is the same as between 96 ℃ and 98 ℃.
○ 4th: At 0℃, water freezes, but we cannot say that “the ice has no temperature” or an
absence of warmth. Also,100 ℃ is not twice as hot as 50℃.
ORDINAL LEVEL
● Has only 2 properties
● Data in ordinal level = categorical values (given to measurements) that can be ordered by
magnitude or some natural order
○ Assigning is done arbitrarily.
● X 3rd: no exact or fixed interval measurements between 2 values
○ Data can’t describe the degree of difference between values.
○ = differences and ratios of measurements taken using the scale aren’t interpretable
● How to analyze ordinal data?
○ Mostly sa counts lang
○ Can visualize using graphs
● Examples: Likert scale: rating scale that assess opinions, attitudes, behaviors as numbers
○ Very unsatisfied (1), satisfied, neutral, satisfied, very satisfied (5)
■ 1 for Anna, 4 for Bruno → Bruno more satisfied, but can you say that Bruno is 3 units
more satisfied? No bc: (1) not fixed unit throughout the scale; (2) subject to
interpretation ang differences; (3) 1–3 may have very difference vs 3–4 that can have
very big difference [depende talaga]
NOMINAL LEVEL (WEAKEST)
● Has only 1 property (1st)
○ = weakest = konti lang arithmetic and statistical methods we can apply
● Classifies and labels variables qualitatively; divides them into named groups without any
quantitative meaning
○ Categories are distinct, mutually exclusive, and exhaustive.
○ Numbers or symbols can be used to classify but those don’t retain any quantitative meaning
● Categorical data: observations measured using categorical or nominal level
● We can only count the # of observations per category + compute for proportions and percentages.
,CHAPTER 2
2.1 DATA COLLECTION METHODS
● Bc as researchers, you need access to data to be able to back up your findings.
USE OF DOCUMENTED DATA
● You’re not always required to collect original data.
● Can obtain documented data from previous studies of individuals or private, gov’t, NGOs
● Can be in published or written reports, unpublished documents, periodicals etc.
● Classifications of data based on source:
○ Primary: data documented by the primary source (data collectors themselves)
■ Advantages:
● Bc primary source often provides vital info crucial in assessing the
applicability and accuracy of collected data [i.e. terms defined, statistical units
used in survey, methodology questionnaire, discussion of
sampling/experimental design]
● Usually more comprehensive
○ Secondary: data documented by a secondary source; documented by individual/agency etc.
other than the data collectors
■ “Originally collected by…”
■ Disadvantages:
● Already filtered to address their purpose
● May certain mistakes due to errors in transcription made
● Collection procedure may not be directly available (to determine if sample is
representative of the population)
● Advantages of documented data:
○ Quick implementation; no need to conceptualize collection procedures (i.e. request nalang)
○ Just clean the data, then proceed to analysis directly
● Disadvantages of documented data:
○ May contain errors that are out of the researcher’s control
OBSERVATION METHOD
● Collecting data on the phenomenon of interest by recording details [using senses] [while it
actually happens]
● Requires that element is in its natural setting; there should be no human interference to make sure
that data gathered is realistic
● Structured vs Unstructured:
○ Structured: measurement materials are the same all throughout the data collection process
○ Unstructured: measurement procedure can change at any point of the process
■ Subjective
■ Pag observation, minsan di nakukuha yung true value (e.g. awkward siya)
● Advantages:
○ It is practical to use when elements cannot verbalize their answers because they cannot
speak (e.g., studies on animal behavior - we can’t administer questionnaire to animals).
○ More successful than surveys in collecting data on behavior that respondents can easily
forget or are ashamed of
○ More successful than experiments in collecting realistic data in the natural setting
■ Although designs where elements are aware of the presence of an observer may
also fail to achieve this (e.g. if awkward, conscious)
, ● Disadvantages:
○ Sometimes need to wait for a long time for the phenomenon of interest to occur
○ Usually, data is based on the subjective perceptions and interpretations of the researcher on
the event under study = statistical techniques may be inappropriate to use
■ But still possible to collect objective quantitative data using the observation method
■ Issue bc of reproducibility
○ Use is limited to collecting data that can be observed
○ May be difficult to penetrate certain environments (e.g. certain beliefs, opinions)
○ Can’t be used to establish cause-and-effect bc no attempt to control extraneous factors
SURVEY METHOD
● Method of collecting data by asking respondents questions
○ Census: the process for obtaining information for the whole population
■ Don’t need to make inferences bc we have the values for our parameter
■ Pwede if not too large population
○ Sample survey: data came from sample of people selected from a well-defined population
○ Respondents: elements; people who answer the questions in a survey
■ Success of survey as a data collection method relies on their honesty + capability to
give truthful answers
○ Questionnaire: contains all the questions asked in a survey = measurement tool
■ Leading questions: may gustong sagot
● Self-administered vs Personal Interview:
○ Self-administered: fill out by their own
■ So keep questions as simple as possible
■ Online surveys: results are responsive and fast, but can be expensive
○ Personal interview: interviewers personally ask and record answers
■ Can be expensive (tokens etc.)
■ Telephone interview: shouldn’t take >10 mins
● Consider the ff. when choosing a method:
○ Ability to secure the type of data, cost, speed, accuracy of data obtained/quality of response,
response rate, geographic flexibility, availability of good interviewers and field supervisors,
population coverage
EXPERIMENT
● Method of collecting data where there is direct human intervention on the conditions that may
affect the value of variable of interest
○ Intervene through:
■ Using a randomization mechanism in assigning the treatments
● = effects of EXTV that experiments couldn’t control are expected to cancel
each other out
■ Controlling the identified extraneous variables
■ = researcher can isolate the effects of EXPV on RV + clarify the direction and
strength of their relationship
● Best for establishing cause-and-effect
● Important terms:
○ Explanatory variables (EXPV, independent): variables in the study whose values are
believed to have an effect on the value of the response variable [RV: dependent]
○ Treatments or factor levels: values/categories of the EXPV being considered in the study
○ Extraneous variables (EXTV): variables that may have an effect on the RV but their effects
are not of interest in the study
, ● Disadvantages:
○ Not always feasible to randomize the assignment of treatments
○ Difficult to assess the reliability of inferences about a well-defined population if experimental
units weren't selected thru a randomization mechanism
○ Results may be different when applied to the natural setting (since we perform experiments
in a controlled environment)
COMPARISON OF SURVEY, EXPERIMENT, OBSERVATION
Data Collection Method
Aspect
Survey Experiment Observation
Assessing the reliability of Generally possible Sometimes difficult Oftentimes difficult
generalizations about a
well-defined population
Ability to establish Poor Superior Poor
cause-and-effect
Realism of data Realistic Least realistic Most realistic
● May be combined in a research basta aligned sa problem and objectives
OTHER METHODS OF DATA COLLECTION
● Registration:
○ From other agencies/orgs thru process of registration, as required by a law, regulation or
usual custom
○ Not always complete (e.g. eligible voters)
○ Ex.: birth/death/marriage data from civil registry docus of PSA, registered vehicles @ LTO,
registered students @ Uni Reg
● Focus group discussions (FGDs): a selected group of people discusses a given topic or issue
in-depth, facilitated by a professional, external moderator
● Computer simulation:
○ Useful for developing new theories for stats
○ Uses statistical model that computes for values of VOI by incorporating the use of a
randomization mechanism
● Use of internal data:
○ Data generated from the operation and administration of researcher's company
○ Possible by-products of administrative management functions of company
○ Ex: personnel records, financial statements, inventory reports, payroll