GCP PROFESSIONAL DATA ENGINEER CERTIFICATION EXAM
NEWEST 2025/2026 WITH COMPLETE QUESTIONS AND
CORRECT ANSWERS |ALREADY GRADED A+||BRAND NEW
VERSION!
An IoT startup has hired you to review their Cloud Bigtable design. The database
stores data generated by over 100,000 sensors that send data every 60 seconds.
Each row contains all the data for one sensor sent during an hour. Hours always
start at the top of the hour. The row-key is the sensor ID concatenated to the hour
of the day followed by the date. What change, if any, would you recommend to
this design?
A. Use one row per sensor and 60-second datasets instead of storing multiple
datasets in a single row.
B. Start the row keyrow-key with the day and hour instead of the sensor ID.
C. Allow hours to start an any arbitrary time to accommodate differences in sensor
clocks.
D. No change is recommended. - ANSWER-A. The correct answer is A. This table
should be designed as a tall and narrow one with a single dataset in each row.
Option B is incorrect because starting a row-key with the date and hour will lead
to hotspotting. Option C is incorrect, since changing the start time will not change
the parts of the design that make querying by ranges more difficult than they
need to be. Option D is incorrect; the structure of rows should be changed from
wide to narrow.
Your company has a Cloud Bigtable database that requires strong consistency, but
it also requires high availability. You have implemented Cloud Bigtable replication
and specified single-cluster routing in the app profile for the database. Some users
1|Page
, GCP Professional Data Engineer Certification Exam
have noted that they occasionally receive query results inconsistent with what
they should have received. The problem seems to correct itself within a minute.
What could be the cause of this problem?
A. Secondary indexes are being updated during the query and return incorrect
results when a secondary index is not fully updated.
B. You have not specified an app configuration file that includes single-cluster
routing and use of replicas only for failover.
C. Tablets are being moved between nodes, which can cause inconsistent query
results.
D. The row-key is not properly designed. - ANSWER-B. The correct answer is B. The
only way to achieve strong consistency in Cloud Bigtable is by having all reads
routed from a single cluster and using the other replicas only for failover. Option A
is incorrect; Cloud Bigtable does not have secondary indexes. Option C
is incorrect; moving tablets does not impact read consistency. Option D is
incorrect; a poor row-key design can impact performance but not consistency.
You have been tasked with migrating a MongoDB database to Cloud Spanner.
MongoDB is a document database, similar to Cloud Firestore. You would like to
maintain some of the document organization of the MongoDB design. What data
type, available in Cloud Spanner, would you use to define a column that can hold a
document-like structure?
A. Array
B. String
C. STRUCT
2|Page
, GCP Professional Data Engineer Certification Exam
D. JSON - ANSWER-C. The correct answer is C; the STRUCT data type is used to
store ordered type fields, and this is the closest to a document structure. Option A
is incorrect; all elements of an array are of the same data type, but items in a
document may consist of different data types.
Option B is incorrect because although a document could be represented in a
string, it does not provide field-level access to data like a STRUCT does. Option D is
incorrect; JSON is not a valid data type in Cloud Spanner.
An application using a Cloud Spanner database has several queries that are taking
longer to execute than the users would like. You review the queries and notice
that they all involve joining three or more tables that are all related hierarchically.
What feature of Cloud
Spanner would you try in order to improve the query performance?
A. Replicated clusters
B. Interleaved tables
C. STORING clause
D. Execution plans - ANSWER-B. The correct answer is B. Since the problematic
queries involved joins of hierarchically related tables, interleaving the data of the
tables could improve join performance. Option A is incorrect; Cloud Bigtable, not
Cloud Spanner, uses replicated clusters. Option C is incorrect; the STORING clause
is used to create indexes that can answer queries using just the index, and that
would not address the join performance problem. Option D is incorrect; an
execution plan might help you understand the cause of a performance problem,
but it would not on its own improve query performance.
3|Page
, GCP Professional Data Engineer Certification Exam
A Cloud Spanner database is using a natural key as the primary key for a large
table. The natural key is the preferred key by users because the values are easy to
relate to other data. Database
administrators notice that these keys are causing hotspots on Cloud Spanner
nodes and are adversely affecting performance. What would you recommend in
order to improve performance?
A. Keep the data of the natural key in the table but use a hash of the natural key
as the primary key
B. Keep the natural key and let Cloud Spanner create more splits to improve
performance
C. Use interleaved tables
D. Use more secondary indexes - ANSWER-A. The correct answer is A. By using a
hash of the natural key, you will avoid hotspotting and keeping the natural key
data in the table will make it available to users. Option B is incorrect because
Cloud Spanner automatically creates splits based on load, and if the
database performance is adversely affected, then splitting is no longer sufficient to
address the problem. Option C is incorrect; interleaved tables reduce the number
of I/O operations performed when retrieving related data. Option D is incorrect;
adding more secondary indexes will not change the hotspotting pattern of the
primary index.
You are using a UUID as the primary key in a Cloud Spanner database. You have
noticed hotspotting that you did not anticipate. What could be the cause?
A. You have too many secondary indexes.
B. You have too few secondary indexes.
4|Page