GCP PROFESSIONAL DATA ENGINEER CERTIFICATION
QUESTIONS AND CORRECT VERIFIED ANSWERS
2026/2027
A developer is planning a mobile application for your company's customers to
use to track information about their accounts. The developer is asking for your
advice on storage technologies. In one case, the developer explains that they
want to write messages each time a significant event occurs, such as the client
opening, viewing, or deleting an account. This data is collected for compliance
reasons, and the developer wants to minimize administrative overhead. What
system would you recommend for storing this data?
A. Cloud SQL using MySQL
B. Cloud SQL using PostgreSQL
C. Cloud Datastore
D. Stackdriver Logging
D. The correct answer is D. Stackdriver Logging is the best option because it is a
managed service designed for storing logging data. Neither Option A nor B is as
good a fit because the developer would have to design and maintain a relational
data model and user interface to view and manage log data. Option C, Cloud
Datastore, would not require a fixed data model, but it would still require the
developer to create and maintain a user interface to manage log events.
You are responsible for developing an ingestion mechanism for a large number
of IoT sensors. The ingestion service should accept data up to 10 minutes late.
The service should also perform some transformations before writing the data
to a database. Which of the managed services would be the best option for
managing late arriving data and performing transformations?
A. Cloud Dataproc
B. Cloud Dataflow
C. Cloud Dataprep
D. Cloud SQL
B. The correct answer is B. Cloud Dataflow is a stream and batch processing
service that is used for transforming data and processing streaming data. Option
A, Cloud Dataproc, is a managed Hadoop and Spark service and not as well suited
,as Cloud Dataflow for the kind of stream processing specified. Option C, Cloud
Dataprep, is an interactive tool for exploring and preparing data sets for analysis.
Option D, Cloud SQL, is a relational database service, so it may be used to store
data, but it is not a service specifically for ingesting and transforming data before
writing to a database.
A team of analysts has collected several CSV datasets with a total size of 50 GB.
They plan to store the datasets in GCP and use Compute Engine instances to run
RStudio, an interactive stats application. Data will be loaded into RStudio using
an RStudio data loading tool. Which of the following is the most appropriate
GCP storage service for the datasets?
A. Cloud Storage
B. Cloud Datastore
C. MongoDB
D. Bigtable
A. The correct answer is A, Cloud Storage, because the data in the files is treated
as an atomic unit of data that is loaded into RStudio. Options B and C are
incorrect because those are document databases and there is no requirement for
storing the data in semistructured format with support for fully indexed querying.
Also, MongoDB is not a GCP service. Option D is incorrect because, although you
could load CSV data into a Bigtable table, the volume of data is not sufficient to
warrant using Bigtable.
A team of analysts has collected several terabytes of telemetry data in CSV
datasets. They plan to store the datasets in GCP and query and analyze the data
using SQL. Which of the following is the most appropriate GCP storage service
for the datasets?
A. Cloud SQL
B. Cloud Spanner
C. BigQuery
D. Bigtable
C. The correct answer is C, BigQuery, which is a managed analytical database
service that supports SQL and scales to petabyte volumes of data. Options A and B
,are incorrect because both are used for transaction processing applications, not
analytics. Option D is incorrect because Bigtable does not support SQL.
You have been hired to consult with a startup that is developing software for
self-driving vehicles. The company's product uses machine learning to predict
the trajectory of persons and vehicles. Currently, the software is being
developed using 20 vehicles, all located in the same city. IoT data is sent from
vehicles every 60 seconds to a MySQL database running on a Compute Engine
instance using an n2-standard-8 machine type with 8 vCPUs and 16 GB of
memory. The startup wants to review their architecture and make any
necessary changes to support tens of thousands of self-driving vehicles, all
transmitting IoT data every second. The vehicles will be located across North
America and Europe. Approximately 4 KB of data is sent in each transmission.
What changes to the architecture would you recommend?
A. None. The current architecture is well suited to the use case.
B. Replace Cloud SQL with Cloud Spanner.
C. Replace Cloud SQL with Bigtable.
D. Replace Cloud SQL with Cloud Datastore.
C. The correct answer is C. Bigtable is the best storage service for IoT data,
especially when a large number of devices will be sending data at short intervals.
Option A is incorrect, because Cloud SQL is designed for transaction processing at
a regional level. Option B is incorrect because Cloud Spanner is designed for
transaction processing, and although it scales to global levels, it is not the best
option for IoT data. Option D is incorrect because there is no need for indexed,
semi-structured data.
As a member of a team of game developers, you have been tasked with devising
a way to track players' possessions. Possessions may be purchased from a
catalog, traded with other players, or awarded for game activities. Possessions
are categorized as clothing, tools, books, and coins. Players may have any
number of possessions of any type. Players can search for other players who
have particular possession types to facilitate trading. The game designer has
informed you that there will likely be new types of possessions and ways to
acquire them in the future. What kind of a data store would you recommend
, using?
A. Transactional database
B. Wide-column database
C. Document database
D. Analytic database
C. The correct answer is C because the requirements call for a semi-structured
schema. You will need to search players' possessions and not just look them up
using a single key because of the requirement for facilitating trading. Option A is
not correct. Transactional databases have fixed schemas, and this use case calls
for a semi-structured schema. Option B is incorrect because it does not support
indexed lookup, which is needed for searching. Option D is incorrect. Analytical
databases are structured data stores.
The CTO of your company wants to reduce the cost of running an HBase and
Hadoop cluster on premises. Only one HBase application is run on the cluster.
The cluster currently supports 10 TB of data, but it is expected to double in the
next six months. Which of the
following managed services would you recommend to replace the on-premises
cluster in order to minimize migration and ongoing operational costs?
A. Cloud Bigtable using the HBase API
B. Cloud Dataflow using the HBase API
C. Cloud Spanner
D. Cloud Datastore
A. The correct answer is A. Cloud Bigtable using the HBase API would minimize
migration efforts, and since Bigtable is a managed service, it would help reduce
operational costs. Option B is incorrect. Cloud Dataflow is a stream and batch
processing service, not a database. Options C and D are incorrect. Relational
databases are not likely to be appropriate choices for an HBase database, which is
a wide-column NoSQL database, and trying to migrate from a wide-column to a
relational database would incur unnecessary costs.
A genomics research institute is developing a platform for analyzing data
related to genetic diseases. The genomics data is in a specialized format known
QUESTIONS AND CORRECT VERIFIED ANSWERS
2026/2027
A developer is planning a mobile application for your company's customers to
use to track information about their accounts. The developer is asking for your
advice on storage technologies. In one case, the developer explains that they
want to write messages each time a significant event occurs, such as the client
opening, viewing, or deleting an account. This data is collected for compliance
reasons, and the developer wants to minimize administrative overhead. What
system would you recommend for storing this data?
A. Cloud SQL using MySQL
B. Cloud SQL using PostgreSQL
C. Cloud Datastore
D. Stackdriver Logging
D. The correct answer is D. Stackdriver Logging is the best option because it is a
managed service designed for storing logging data. Neither Option A nor B is as
good a fit because the developer would have to design and maintain a relational
data model and user interface to view and manage log data. Option C, Cloud
Datastore, would not require a fixed data model, but it would still require the
developer to create and maintain a user interface to manage log events.
You are responsible for developing an ingestion mechanism for a large number
of IoT sensors. The ingestion service should accept data up to 10 minutes late.
The service should also perform some transformations before writing the data
to a database. Which of the managed services would be the best option for
managing late arriving data and performing transformations?
A. Cloud Dataproc
B. Cloud Dataflow
C. Cloud Dataprep
D. Cloud SQL
B. The correct answer is B. Cloud Dataflow is a stream and batch processing
service that is used for transforming data and processing streaming data. Option
A, Cloud Dataproc, is a managed Hadoop and Spark service and not as well suited
,as Cloud Dataflow for the kind of stream processing specified. Option C, Cloud
Dataprep, is an interactive tool for exploring and preparing data sets for analysis.
Option D, Cloud SQL, is a relational database service, so it may be used to store
data, but it is not a service specifically for ingesting and transforming data before
writing to a database.
A team of analysts has collected several CSV datasets with a total size of 50 GB.
They plan to store the datasets in GCP and use Compute Engine instances to run
RStudio, an interactive stats application. Data will be loaded into RStudio using
an RStudio data loading tool. Which of the following is the most appropriate
GCP storage service for the datasets?
A. Cloud Storage
B. Cloud Datastore
C. MongoDB
D. Bigtable
A. The correct answer is A, Cloud Storage, because the data in the files is treated
as an atomic unit of data that is loaded into RStudio. Options B and C are
incorrect because those are document databases and there is no requirement for
storing the data in semistructured format with support for fully indexed querying.
Also, MongoDB is not a GCP service. Option D is incorrect because, although you
could load CSV data into a Bigtable table, the volume of data is not sufficient to
warrant using Bigtable.
A team of analysts has collected several terabytes of telemetry data in CSV
datasets. They plan to store the datasets in GCP and query and analyze the data
using SQL. Which of the following is the most appropriate GCP storage service
for the datasets?
A. Cloud SQL
B. Cloud Spanner
C. BigQuery
D. Bigtable
C. The correct answer is C, BigQuery, which is a managed analytical database
service that supports SQL and scales to petabyte volumes of data. Options A and B
,are incorrect because both are used for transaction processing applications, not
analytics. Option D is incorrect because Bigtable does not support SQL.
You have been hired to consult with a startup that is developing software for
self-driving vehicles. The company's product uses machine learning to predict
the trajectory of persons and vehicles. Currently, the software is being
developed using 20 vehicles, all located in the same city. IoT data is sent from
vehicles every 60 seconds to a MySQL database running on a Compute Engine
instance using an n2-standard-8 machine type with 8 vCPUs and 16 GB of
memory. The startup wants to review their architecture and make any
necessary changes to support tens of thousands of self-driving vehicles, all
transmitting IoT data every second. The vehicles will be located across North
America and Europe. Approximately 4 KB of data is sent in each transmission.
What changes to the architecture would you recommend?
A. None. The current architecture is well suited to the use case.
B. Replace Cloud SQL with Cloud Spanner.
C. Replace Cloud SQL with Bigtable.
D. Replace Cloud SQL with Cloud Datastore.
C. The correct answer is C. Bigtable is the best storage service for IoT data,
especially when a large number of devices will be sending data at short intervals.
Option A is incorrect, because Cloud SQL is designed for transaction processing at
a regional level. Option B is incorrect because Cloud Spanner is designed for
transaction processing, and although it scales to global levels, it is not the best
option for IoT data. Option D is incorrect because there is no need for indexed,
semi-structured data.
As a member of a team of game developers, you have been tasked with devising
a way to track players' possessions. Possessions may be purchased from a
catalog, traded with other players, or awarded for game activities. Possessions
are categorized as clothing, tools, books, and coins. Players may have any
number of possessions of any type. Players can search for other players who
have particular possession types to facilitate trading. The game designer has
informed you that there will likely be new types of possessions and ways to
acquire them in the future. What kind of a data store would you recommend
, using?
A. Transactional database
B. Wide-column database
C. Document database
D. Analytic database
C. The correct answer is C because the requirements call for a semi-structured
schema. You will need to search players' possessions and not just look them up
using a single key because of the requirement for facilitating trading. Option A is
not correct. Transactional databases have fixed schemas, and this use case calls
for a semi-structured schema. Option B is incorrect because it does not support
indexed lookup, which is needed for searching. Option D is incorrect. Analytical
databases are structured data stores.
The CTO of your company wants to reduce the cost of running an HBase and
Hadoop cluster on premises. Only one HBase application is run on the cluster.
The cluster currently supports 10 TB of data, but it is expected to double in the
next six months. Which of the
following managed services would you recommend to replace the on-premises
cluster in order to minimize migration and ongoing operational costs?
A. Cloud Bigtable using the HBase API
B. Cloud Dataflow using the HBase API
C. Cloud Spanner
D. Cloud Datastore
A. The correct answer is A. Cloud Bigtable using the HBase API would minimize
migration efforts, and since Bigtable is a managed service, it would help reduce
operational costs. Option B is incorrect. Cloud Dataflow is a stream and batch
processing service, not a database. Options C and D are incorrect. Relational
databases are not likely to be appropriate choices for an HBase database, which is
a wide-column NoSQL database, and trying to migrate from a wide-column to a
relational database would incur unnecessary costs.
A genomics research institute is developing a platform for analyzing data
related to genetic diseases. The genomics data is in a specialized format known