C207 OA QUESTIONS WITH CORRECT ANSWERS
Simple indexing -Ans>>Common analytic measure to improve performance. Compares current data
with data during a base period.
(Price / Price during "Base Period") x 100
i.e. Big Mac was 1.60 in 1968 which is base period. what is index for 2014 if price was 4.80 then?
(4..60) * 100 = 300 (means price is 3x greater than base period)
Used to identify price fluctuations of supplies, materials, products, etc.
Weighted Index -Ans>>assign a weight to allow for significant differences in the index.
Reasons for including analytics in decision-making -Ans>>decrease cost of data storage
increase processing power
Descriptive Analytics -Ans>>using current and past data for strictly descriptive purposes.
i.e. car price data shows a 2% increase over the prior year
a manager wants to know why sales spiked during the prior quarter
Predictive / Inferential Analytics -Ans>>using current and past data to predict/estimate future.
i.e. based on the past 10 years of data for car prices, we predict an increase of 1.5% over the
upcoming year.
Prescriptive Analytics -Ans>>using past data to PREDICT or ESTIMATE future in order to optimize
operations
includes experimental design and optimization to aid in DECISION-MAKING. MANAGERIAL DECISIONS.
i.e. based on past data, sales prices for electric cars could increase by 5% if we increased charging
stations by 7%
Big data -Ans>>Data so big that it's difficult to process using traditional methods.
Stored in a Data Warehouse.
Mined to identify patterns and trends
Primary purpose is to encourage buying behavior.
Enables products to be more tailored to customer base.
Improves decision-making.
Supports development of next generation products/services.
watch for keywords in test options. i.e. company TOTAL sales (just one number) vs all sales invoices
Structured / Quantitative Data -Ans>>Data follows pre-defined formats.
,i.e. multiple choice answers, addresses, names, stock tickers
Unstructured / Qualitative Data -Ans>>Data doesn't follow pre-defined formats. Usually gets
structured by a "theme analysis"
i.e. blocks of freeform text, audio, video
Continuous Data -Ans>>Data that can take any value (within a set range)
i.e. 3.14159, -189,115.2
a thermometer reads 66.5 degrees
Interval Data (data measuring levels) -Ans>>data is ordered at equal intervals apart and "0" doesn't
mean absence of data, just another data point
a type of continuous data
i.e. date, time, degrees
Ratio Data (data measuring levels) -Ans>>0 actually means nothing, not just a data point
a type of continuous data
i.e. money, height weight
Discrete Data -Ans>>Data that can only take on whole values and has clear boundaries
i.e. 4, 7, 8 in a preset range of 1-100
Ordinal data (data measuring levels) -Ans>>data is ordered based on quality
a type of discrete data
i.e. in blackbelt data, level "3" is higher quality than "1"
gold, silver, and bronze medals
Nominal / Categorical Data (data measuring levels) -Ans>>data is assigned a category/label for
identification and grouping purposes
a type of discrete data
i.e. males are assigned "0" and females "1"
potential quality errors: categories can be misspelled
Attribute Data -Ans>>Data that shows whether a result meets a requirement or not (yes/no, pass/fail).
Davenport-Kim Three-Stage Model -Ans>>1. Frame the problem - recognize problem and review
previous findings.
2. Solve the problem - modeling, collection, analysis
3. Communicate results - tailor to audience, use visuals, show results.
Reliability of Data -Ans>>data that is consistent (but not necessarily accurate)
,i.e. a thermometer reads 20, 21, 21, 20, 20, 19, 19, 21, 20, 19
a test given to a student consistently shows similar scores
Validity of Data -Ans>>data that is accurate
requires sample selection to be adequate size and random.
i.e. a thermometer consistently reads from 20-25 F but the water isn't even frozen (not valid)
Data Error Types -Ans>>Omission - data being left out, missed, forgotten. sorting in spreadsheet can
help to identify
Out of Range - data that doesn't fit the expected, viable range. sorting in spreadsheet can help to
identify outliers.
Entry/input errors - typos, miscommunications, illegible handwriting
Systematic Error -Ans>>error will cause other errors until fixed
i.e. a tire pressure sensor breaks and stops functioning, resulting in omission errors until fixed
a scale is calibrated prior to being used in order to reduce systemic error
Random Error / Unpredictable Error -Ans>>error that does not consistently repeat due to system flaw
and therefore doesn't need fix/adjustment. aka its "self fixing".
minimize effects by increasing sample size
i.e. a tire pressure sensor records an outlier / out-of-range caused by going over a speed bump at high
speed
True Score Theory -Ans>>Observed Score (raw data score) = true score + random error score +
systematic error
in absence of systematic error, it's just true score + random error
Measurement Bias -Ans>>data doesn't represent the study group because of:
1. sample isn't random enough
2. sample isn't big enough
3. sample wasn't inclusive enough, or was too inclusive, to represent study group
i.e. a survey on favorite foods was sent to all renters in a city (didn't include homeowners so not a
"Truly Representative Sample")
Conscious Bias -Ans>>the subject is biased towards a certain result because he believes it will benefit
him in some way
Information Bias -Ans>>response bias - people give different answers when the response isn't
anonymous and confidential
i.e. a boss surveys his own employees to see if they are satisfied with is performance
conscious bias - questions are deliberately leading or persuading the subject toward a certain answer
i.e. a survey question reads: "Don't you think it would be better if the gov't provided free
contraception?"
, Data Management -Ans>>cleaning and organizing data
Quality Control in data -Ans>>reducing and minimizing data errors
clean and organize data
reduce amount of incomplete data
Two Major Issues with Research Standards -Ans>>Agreement on best practices
Ethics
Misuse of Statistics -Ans>>all the bias. additionally:
1. Assuming that correlation equals causation. CONCLUDES, DETERMINES, ASSERTS
2. Lack of blinding.
3. Faulty operationalization - unclear testing model, undefined terms, not coded/categorized.
Probability Theory -Ans>>informs decision-makers by quantifying risk
Probability of the complement -Ans>>the probability remaining. if only 2 possible outcomes are
possible, then you get probability and probability of complement.
i.e. if probability is 2/3 then probability of complement is 1/3
Intersection Probability -Ans>>use the "Multiplication Principle" (multiply the probabilities)
p of x AND y
p of ALL the following
p of BOTH x and y
p of X GIVEN b
p of X WHEN b
Union Probability -Ans>>p of x OR y
p of EITHER x or y
AT LEAST
ANY of the following
add the possibilities
Combination Probability -Ans>>rule / formula for determining how many POTENTIAL / POSSIBLE
OUTCOMES
Bayes' Theorem (probability) -Ans>>rule to calculate conditional probability
GIVEN THAT
"If P(A) is the case, then what is the P(B)"
or
"Given event A, what is the probably of event B"
P(A) / P(A|B)
Use median when -Ans>>data has outliers or is skewed, otherwise you can use mean
Simple indexing -Ans>>Common analytic measure to improve performance. Compares current data
with data during a base period.
(Price / Price during "Base Period") x 100
i.e. Big Mac was 1.60 in 1968 which is base period. what is index for 2014 if price was 4.80 then?
(4..60) * 100 = 300 (means price is 3x greater than base period)
Used to identify price fluctuations of supplies, materials, products, etc.
Weighted Index -Ans>>assign a weight to allow for significant differences in the index.
Reasons for including analytics in decision-making -Ans>>decrease cost of data storage
increase processing power
Descriptive Analytics -Ans>>using current and past data for strictly descriptive purposes.
i.e. car price data shows a 2% increase over the prior year
a manager wants to know why sales spiked during the prior quarter
Predictive / Inferential Analytics -Ans>>using current and past data to predict/estimate future.
i.e. based on the past 10 years of data for car prices, we predict an increase of 1.5% over the
upcoming year.
Prescriptive Analytics -Ans>>using past data to PREDICT or ESTIMATE future in order to optimize
operations
includes experimental design and optimization to aid in DECISION-MAKING. MANAGERIAL DECISIONS.
i.e. based on past data, sales prices for electric cars could increase by 5% if we increased charging
stations by 7%
Big data -Ans>>Data so big that it's difficult to process using traditional methods.
Stored in a Data Warehouse.
Mined to identify patterns and trends
Primary purpose is to encourage buying behavior.
Enables products to be more tailored to customer base.
Improves decision-making.
Supports development of next generation products/services.
watch for keywords in test options. i.e. company TOTAL sales (just one number) vs all sales invoices
Structured / Quantitative Data -Ans>>Data follows pre-defined formats.
,i.e. multiple choice answers, addresses, names, stock tickers
Unstructured / Qualitative Data -Ans>>Data doesn't follow pre-defined formats. Usually gets
structured by a "theme analysis"
i.e. blocks of freeform text, audio, video
Continuous Data -Ans>>Data that can take any value (within a set range)
i.e. 3.14159, -189,115.2
a thermometer reads 66.5 degrees
Interval Data (data measuring levels) -Ans>>data is ordered at equal intervals apart and "0" doesn't
mean absence of data, just another data point
a type of continuous data
i.e. date, time, degrees
Ratio Data (data measuring levels) -Ans>>0 actually means nothing, not just a data point
a type of continuous data
i.e. money, height weight
Discrete Data -Ans>>Data that can only take on whole values and has clear boundaries
i.e. 4, 7, 8 in a preset range of 1-100
Ordinal data (data measuring levels) -Ans>>data is ordered based on quality
a type of discrete data
i.e. in blackbelt data, level "3" is higher quality than "1"
gold, silver, and bronze medals
Nominal / Categorical Data (data measuring levels) -Ans>>data is assigned a category/label for
identification and grouping purposes
a type of discrete data
i.e. males are assigned "0" and females "1"
potential quality errors: categories can be misspelled
Attribute Data -Ans>>Data that shows whether a result meets a requirement or not (yes/no, pass/fail).
Davenport-Kim Three-Stage Model -Ans>>1. Frame the problem - recognize problem and review
previous findings.
2. Solve the problem - modeling, collection, analysis
3. Communicate results - tailor to audience, use visuals, show results.
Reliability of Data -Ans>>data that is consistent (but not necessarily accurate)
,i.e. a thermometer reads 20, 21, 21, 20, 20, 19, 19, 21, 20, 19
a test given to a student consistently shows similar scores
Validity of Data -Ans>>data that is accurate
requires sample selection to be adequate size and random.
i.e. a thermometer consistently reads from 20-25 F but the water isn't even frozen (not valid)
Data Error Types -Ans>>Omission - data being left out, missed, forgotten. sorting in spreadsheet can
help to identify
Out of Range - data that doesn't fit the expected, viable range. sorting in spreadsheet can help to
identify outliers.
Entry/input errors - typos, miscommunications, illegible handwriting
Systematic Error -Ans>>error will cause other errors until fixed
i.e. a tire pressure sensor breaks and stops functioning, resulting in omission errors until fixed
a scale is calibrated prior to being used in order to reduce systemic error
Random Error / Unpredictable Error -Ans>>error that does not consistently repeat due to system flaw
and therefore doesn't need fix/adjustment. aka its "self fixing".
minimize effects by increasing sample size
i.e. a tire pressure sensor records an outlier / out-of-range caused by going over a speed bump at high
speed
True Score Theory -Ans>>Observed Score (raw data score) = true score + random error score +
systematic error
in absence of systematic error, it's just true score + random error
Measurement Bias -Ans>>data doesn't represent the study group because of:
1. sample isn't random enough
2. sample isn't big enough
3. sample wasn't inclusive enough, or was too inclusive, to represent study group
i.e. a survey on favorite foods was sent to all renters in a city (didn't include homeowners so not a
"Truly Representative Sample")
Conscious Bias -Ans>>the subject is biased towards a certain result because he believes it will benefit
him in some way
Information Bias -Ans>>response bias - people give different answers when the response isn't
anonymous and confidential
i.e. a boss surveys his own employees to see if they are satisfied with is performance
conscious bias - questions are deliberately leading or persuading the subject toward a certain answer
i.e. a survey question reads: "Don't you think it would be better if the gov't provided free
contraception?"
, Data Management -Ans>>cleaning and organizing data
Quality Control in data -Ans>>reducing and minimizing data errors
clean and organize data
reduce amount of incomplete data
Two Major Issues with Research Standards -Ans>>Agreement on best practices
Ethics
Misuse of Statistics -Ans>>all the bias. additionally:
1. Assuming that correlation equals causation. CONCLUDES, DETERMINES, ASSERTS
2. Lack of blinding.
3. Faulty operationalization - unclear testing model, undefined terms, not coded/categorized.
Probability Theory -Ans>>informs decision-makers by quantifying risk
Probability of the complement -Ans>>the probability remaining. if only 2 possible outcomes are
possible, then you get probability and probability of complement.
i.e. if probability is 2/3 then probability of complement is 1/3
Intersection Probability -Ans>>use the "Multiplication Principle" (multiply the probabilities)
p of x AND y
p of ALL the following
p of BOTH x and y
p of X GIVEN b
p of X WHEN b
Union Probability -Ans>>p of x OR y
p of EITHER x or y
AT LEAST
ANY of the following
add the possibilities
Combination Probability -Ans>>rule / formula for determining how many POTENTIAL / POSSIBLE
OUTCOMES
Bayes' Theorem (probability) -Ans>>rule to calculate conditional probability
GIVEN THAT
"If P(A) is the case, then what is the P(B)"
or
"Given event A, what is the probably of event B"
P(A) / P(A|B)
Use median when -Ans>>data has outliers or is skewed, otherwise you can use mean