2026/2027 | Updated Verified Questions and Answers | Pass
Guaranteed - A+ Graded
Section 1: Data Analytics Lifecycle & Business Intelligence (Questions 1-25)
Q1: A retail company wants to understand why their online sales dropped in Q3. They
have web analytics, customer service logs, and social media sentiment data. Which
phase of the data analytics lifecycle involves defining the specific business question and
identifying the stakeholders who will use the findings?
A. Data preparation
B. Problem definition [CORRECT]
C. Model evaluation
D. Data acquisition
Correct Answer: B
Rationale: The problem definition phase establishes the business question (why Q3
sales dropped), identifies stakeholders (marketing, operations), and sets success
criteria before any data is collected or analyzed. Data preparation involves cleaning,
data acquisition involves gathering the sources, and model evaluation occurs after
algorithms are run. WGU C207 Exam Tip: Always start with problem definition; WGU
heavily tests whether you know that jumping into data acquisition before defining the
problem leads to wasted resources.
Q2: An analyst merges customer data from a CRM system (structured) with Twitter
posts (unstructured) regarding a recent product launch. What is this process primarily
an example of?
A. Data normalization
B. Data integration [CORRECT]
C. Data encoding
D. Data scaling
Correct Answer: B
Rationale: Data integration involves combining data from multiple heterogeneous
sources, such as structured CRM data and unstructured text data from social media,
into a unified view for analysis. Normalization and scaling adjust numeric ranges, while
encoding converts categorical text to numbers. WGU C207 Exam Tip: Differentiate
integration (merging sources) from transformation (changing formats/structures).
Q3: A data scientist is preparing a dataset for linear regression. They discover that one
feature measures annual income in dollars (ranging from $20,000 to $200,000) and
,another measures age in years (18 to 80). To ensure the algorithm treats both features
equally, which technique should the analyst apply?
A. Dummy variable encoding
B. Outlier removal
C. Normalization or standardization [CORRECT]
D. Deduplication
Correct Answer: C
Rationale: Normalization (scaling to a 0-1 range) or standardization (scaling to a mean
of 0 and standard deviation of 1) ensures that features with larger numeric ranges
(income) do not dominate the algorithm over features with smaller ranges (age).
Dummy encoding is for categorical variables, and deduplication removes duplicate
rows. WGU C207 Exam Tip: WGU frequently tests the specific reason for
scaling—preventing features with larger magnitudes from dominating distance-based or
gradient-based algorithms.
Q4: A marketing team uses Tableau to create a daily summary of website traffic,
conversion rates, and ad spend. This tool allows users to drag-and-drop fields to build
visuals without writing SQL code. Which concept does this best describe?
A. Ad-hoc querying
B. Scheduled reporting
C. Self-service analytics [CORRECT]
D. Operational dashboards
Correct Answer: C
Rationale: Self-service analytics enables business users to generate insights and build
reports autonomously using drag-and-drop BI tools like Tableau or Power BI, without
requiring IT or SQL knowledge. Operational dashboards track real-time metrics, while
scheduled reports are automated. WGU C207 Exam Tip: Recognize "drag-and-drop"
and "without writing code" as textbook definitions of self-service analytics and data
democratization.
Q5: A company tracks the percentage of defective products manufactured each day.
This metric tells them what has already happened. Which type of indicator is this?
A. Leading indicator
B. Lagging indicator [CORRECT]
C. Predictive indicator
D. Diagnostic indicator
Correct Answer: B
Rationale: Lagging indicators measure outcomes after they have occurred, such as
yesterday's defect rate or last quarter's revenue. Leading indicators, like employee
training hours, attempt to predict future performance. WGU C207 Exam Tip: WGU loves
,to test the difference between lagging (historical results) and leading (future predictors)
indicators in a Balanced Scorecard context.
Q6: In a data warehouse, a central fact table contains the daily sales revenue, quantity
sold, and discount amount. Surrounding it are dimension tables for Date, Product,
Customer, and Store. Which schema design is being used?
A. Snowflake schema
B. Star schema [CORRECT]
C. Flat table
D. Normalized transactional schema
Correct Answer: B
Rationale: A star schema consists of a single central fact table surrounded by
denormalized dimension tables, resembling a star. A snowflake schema normalizes the
dimension tables into multiple related tables. WGU C207 Exam Tip: If the question
mentions a single fact table connected directly to descriptive dimensions without
sub-dimensions, choose Star Schema.
Q7: An analyst needs to view total sales by region, then drill down to view sales by state
within a specific region, and finally drill down to individual store sales. Which OLAP
operation is being performed?
A. Slicing
B. Dicing
C. Roll-up
D. Drill-down [CORRECT]
Correct Answer: D
Rationale: Drill-down navigates from a less detailed data level (region) to a more
detailed level (state, then store). Roll-up is the exact opposite (aggregating detail to
higher levels). Slicing selects one dimension, and dicing selects a sub-cube. WGU
C207 Exam Tip: Visualize hierarchy—going down a hierarchy (Region -> State -> City)
is always drill-down.
Q8: A hospital generates continuous real-time data from ICU heart monitors that must
be analyzed immediately to alert nurses of patient distress. Which characteristic of Big
Data does this primarily represent?
A. Volume
B. Velocity [CORRECT]
C. Variety
D. Veracity
Correct Answer: B
Rationale: Velocity refers to the speed at which data is generated and the requirement
for real-time or near-real-time processing. Volume is the amount of data, variety is the
, different types, and veracity is the reliability/uncertainty. WGU C207 Exam Tip:
Keywords like "streaming," "real-time," and "immediate alerts" always point to Velocity.
Q9: A social media platform needs to store user profile data that varies wildly from user
to user (e.g., some have links to 5 websites, others have none; some list 3 phone
numbers, others list 0). Which NoSQL database type is best suited for this highly
variable, document-based data?
A. Column-family (e.g., Cassandra)
B. Graph (e.g., Neo4j)
C. Key-value (e.g., Redis)
D. Document (e.g., MongoDB) [CORRECT]
Correct Answer: D
Rationale: Document databases like MongoDB store data in flexible, schema-less
JSON-like documents, making them ideal for data with highly variable attributes per
record. Column-family is for high-write throughput analytics, graph is for relationship
mapping, and key-value is for simple lookups. WGU C207 Exam Tip: Match "variable
schema" or "JSON-like" to Document databases.
Q10: Which Hadoop ecosystem component is specifically designed to process large
datasets in-memory, making it significantly faster than traditional MapReduce for
iterative machine learning algorithms?
A. Hive
B. HDFS
C. Spark [CORRECT]
D. Kafka
Correct Answer: C
Rationale: Apache Spark performs in-memory processing, which drastically reduces the
disk I/O overhead of MapReduce, making it ideal for iterative tasks like machine
learning. Hive provides a SQL interface, HDFS is the storage layer, and Kafka handles
streaming data ingestion. WGU C207 Exam Tip: When the question emphasizes
"in-memory" or "faster than MapReduce," the answer is always Spark.
Q11: A CEO bases a major strategic decision on a "gut feeling" rather than consulting
the newly built executive dashboard. What barrier to a data-driven culture does this
represent?
A. Data literacy deficit
B. Resistance to change [CORRECT]
C. Lack of self-service tools
D. Poor data quality
Correct Answer: B