GCP PROFESSIONAL DATA ENGINEER Certification Exam Questions and
Correct Answers A+ 2025
You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are
running on Compute Engine instances. You need to encrypt data at rest with encryption keys
that you can create, rotate, and destroy as needed. What should you do?
A. Create a dedicated service account and use encryption at rest to reference your data stored in
your Compute Engine cluster instances as part of your API service calls.
B. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your
data in all of the Compute Engine cluster instances.
C. Create encryption keys locally. Upload your encryption keys to Cloud Key Management
Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
D. Create encryption keys in Cloud Key Management Service. Reference those keys in your API
service calls when accessing the data in your Compute Engine cluster instances.
- Correct Answer :B
You are selecting services to write and transform JSON messages from Cloud Pub/Sub to
BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want
to monitor and accommodate input data volume that will vary in size with minimal manual
intervention. What should you do?
A. Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster.
Resize the number of worker nodes in your cluster via the command line.
A+ TEST BANK 1
,B. Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an
operational output archive. Locate the bottleneck and adjust cluster resources.
C. Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver.
Use the default autoscaling setting for worker instances.
D. Use Cloud Dataflow to run your transformations. Monitor the total execution time for a
sampling of jobs. Configure the job to use non
- Correct Answer :C
You are designing storage for very large text files for a data pipeline on Google Cloud. You want
to support ANSI SQL queries. You also want to support compression and parallel load from the
input locations using Google recommended practices. What should you do?
A. Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and
query.
B. Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and
BigQuery permanently linked tables for the query.
C. Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and
query.
D. Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then
import into Cloud Bigtable for query.
Reveal Solution Discussion 37
Previous QuestionsNext Questions
- Correct Answer :A
You are developing an application that uses a recommendation engine on Google Cloud. Your
solution should display new videos to customers based on past views. Your solution needs to
generate labels for the entities in videos that the customer has viewed. Your design must be able
to provide very fast filtering suggestions based on data from other customer preferences on
several TB of data. What should you do?
A. Build and train a complex classification model with Spark MLlib to generate labels and filter
the results. Deploy the models using Cloud Dataproc. Call the model from your application.
B. Build and train a classification model with Spark MLlib to generate labels. Build and train a
second classification model with Spark MLlib to filter results to match customer preferences.
Deploy the models using Cloud Dataproc. Call the models from your application.
C. Build an application that calls the Cloud Video Intellig
A+ TEST BANK 2
, GCP Professional Data Engineer
Certification Exam
- Correct Answer :C
Your infrastructure includes a set of YouTube channels. You have been tasked with creating a
process for sending the YouTube channel data to Google Cloud for analysis. You want to design a
solution that allows your world-wide marketing teams to perform ANSI SQL and other types of
analysis on up-to-date YouTube channels log data. How should you set up the log data transfer
into Google Cloud?
A. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-
Regional storage bucket as a final destination.
B. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional
bucket as a final destination.
C. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage
Multi-Regional storage bucket as a final destination.
D. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage
Regional storage bucket
- Correct Answer :A
You are developing an application on Google Cloud that will automatically generate subject
labels for users' blog posts. You are under competitive pressure to add this feature quickly, and
you have no additional developer resources. No one on your team has experience with machine
learning. What should you do?
A. Call the Cloud Natural Language API from your application. Process the generated Entity
Analysis as labels.
B. Call the Cloud Natural Language API from your application. Process the generated Sentiment
Analysis as labels.
C. Build and train a text classification model using TensorFlow. Deploy the model using Cloud
Machine Learning Engine. Call the model from your application and process the results as labels.
D. Build and train a text classification model using TensorFlow. Deploy the model using a
Kubernetes Engine cluster. Call the model from your application and process the results as
labels.
- Correct Answer :A
You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google
Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate
A+ TEST BANK 3
, GCP Professional Data Engineer
Certification Exam
values for multiple users who will query the data in Cloud Storage with multiple engines. Which
storage service and schema design should you use?
A. Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query
the Cloud Bigtable data.
B. Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
C. Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
D. Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
- Correct Answer :C
Your company built a TensorFlow neutral-network model with a large number of neurons and
layers. The model fits well for the training data. However, when tested against new data, it
performs poorly. What method can you employ to address this?
A. Threading
B. Serialization
C. Dropout Methods
D. Dimensionality Reduction
- Correct Answer :C
You need to design a pipeline that can ingest batch data from your organization's application
metrics as well as your user database, then join the data using a common key before outputting
to BigQuery. What is the most efficient way to go about this?
A. Create a Cloud Dataflow pipeline and join the two PCollections using a Combine transform on
the common key.
B. Ingest both datasets separately into BigQuery and create a new final table using an SQL JOIN.
C. Write a Cloud Function script that can perform the join on a per-record basis. Stream both
datasets through Cloud Pub/Sub triggering the Cloud Function.
D. Create a Cloud Dataflow pipeline and join the two PCollections using CoGroupByKey transform
on the common key.
A+ TEST BANK 4
Correct Answers A+ 2025
You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are
running on Compute Engine instances. You need to encrypt data at rest with encryption keys
that you can create, rotate, and destroy as needed. What should you do?
A. Create a dedicated service account and use encryption at rest to reference your data stored in
your Compute Engine cluster instances as part of your API service calls.
B. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your
data in all of the Compute Engine cluster instances.
C. Create encryption keys locally. Upload your encryption keys to Cloud Key Management
Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
D. Create encryption keys in Cloud Key Management Service. Reference those keys in your API
service calls when accessing the data in your Compute Engine cluster instances.
- Correct Answer :B
You are selecting services to write and transform JSON messages from Cloud Pub/Sub to
BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want
to monitor and accommodate input data volume that will vary in size with minimal manual
intervention. What should you do?
A. Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster.
Resize the number of worker nodes in your cluster via the command line.
A+ TEST BANK 1
,B. Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an
operational output archive. Locate the bottleneck and adjust cluster resources.
C. Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver.
Use the default autoscaling setting for worker instances.
D. Use Cloud Dataflow to run your transformations. Monitor the total execution time for a
sampling of jobs. Configure the job to use non
- Correct Answer :C
You are designing storage for very large text files for a data pipeline on Google Cloud. You want
to support ANSI SQL queries. You also want to support compression and parallel load from the
input locations using Google recommended practices. What should you do?
A. Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and
query.
B. Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and
BigQuery permanently linked tables for the query.
C. Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and
query.
D. Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then
import into Cloud Bigtable for query.
Reveal Solution Discussion 37
Previous QuestionsNext Questions
- Correct Answer :A
You are developing an application that uses a recommendation engine on Google Cloud. Your
solution should display new videos to customers based on past views. Your solution needs to
generate labels for the entities in videos that the customer has viewed. Your design must be able
to provide very fast filtering suggestions based on data from other customer preferences on
several TB of data. What should you do?
A. Build and train a complex classification model with Spark MLlib to generate labels and filter
the results. Deploy the models using Cloud Dataproc. Call the model from your application.
B. Build and train a classification model with Spark MLlib to generate labels. Build and train a
second classification model with Spark MLlib to filter results to match customer preferences.
Deploy the models using Cloud Dataproc. Call the models from your application.
C. Build an application that calls the Cloud Video Intellig
A+ TEST BANK 2
, GCP Professional Data Engineer
Certification Exam
- Correct Answer :C
Your infrastructure includes a set of YouTube channels. You have been tasked with creating a
process for sending the YouTube channel data to Google Cloud for analysis. You want to design a
solution that allows your world-wide marketing teams to perform ANSI SQL and other types of
analysis on up-to-date YouTube channels log data. How should you set up the log data transfer
into Google Cloud?
A. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-
Regional storage bucket as a final destination.
B. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional
bucket as a final destination.
C. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage
Multi-Regional storage bucket as a final destination.
D. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage
Regional storage bucket
- Correct Answer :A
You are developing an application on Google Cloud that will automatically generate subject
labels for users' blog posts. You are under competitive pressure to add this feature quickly, and
you have no additional developer resources. No one on your team has experience with machine
learning. What should you do?
A. Call the Cloud Natural Language API from your application. Process the generated Entity
Analysis as labels.
B. Call the Cloud Natural Language API from your application. Process the generated Sentiment
Analysis as labels.
C. Build and train a text classification model using TensorFlow. Deploy the model using Cloud
Machine Learning Engine. Call the model from your application and process the results as labels.
D. Build and train a text classification model using TensorFlow. Deploy the model using a
Kubernetes Engine cluster. Call the model from your application and process the results as
labels.
- Correct Answer :A
You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google
Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate
A+ TEST BANK 3
, GCP Professional Data Engineer
Certification Exam
values for multiple users who will query the data in Cloud Storage with multiple engines. Which
storage service and schema design should you use?
A. Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query
the Cloud Bigtable data.
B. Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
C. Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
D. Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.
- Correct Answer :C
Your company built a TensorFlow neutral-network model with a large number of neurons and
layers. The model fits well for the training data. However, when tested against new data, it
performs poorly. What method can you employ to address this?
A. Threading
B. Serialization
C. Dropout Methods
D. Dimensionality Reduction
- Correct Answer :C
You need to design a pipeline that can ingest batch data from your organization's application
metrics as well as your user database, then join the data using a common key before outputting
to BigQuery. What is the most efficient way to go about this?
A. Create a Cloud Dataflow pipeline and join the two PCollections using a Combine transform on
the common key.
B. Ingest both datasets separately into BigQuery and create a new final table using an SQL JOIN.
C. Write a Cloud Function script that can perform the join on a per-record basis. Stream both
datasets through Cloud Pub/Sub triggering the Cloud Function.
D. Create a Cloud Dataflow pipeline and join the two PCollections using CoGroupByKey transform
on the common key.
A+ TEST BANK 4