Exam (elaborations)

GCP Professional Data Engineer Certification Exam Questions With 100% Pass

Rating

Sold

Pages

107

Grade

A+

Uploaded on

29-08-2025

Written in

2025/2026

GCP Professional Data Engineer Certification Exam Questions With 100% Pass /. A developer is planning a mobile application for your company's customers to use to track information about their accounts. The developer is asking for your advice on storage technologies. In one case, the developer explains that they want to write messages each time a significant event occurs, such as the client opening, viewing, or deleting an account. This data is collected for compliance reasons, and the developer wants to minimize administrative overhead. What system would you recommend for storing this data? A. Cloud SQL using MySQL B. Cloud SQL using PostgreSQL C. Cloud Datastore D. Stackdriver Logging - Answer-D. The correct answer is D. Stackdriver Logging is the best option because it is a managed service designed for storing logging data. Neither Option A nor B is as good a fit because the developer would have to design and maintain a relational data model and user interface to view and manage log data. Option C, Cloud Datastore, would not require a fixed data model, but it would still require the developer to create and maintain a user interface to manage log events. /.You are responsible for developing an ingestion mechanism for a large number of IoT sensors. The ingestion service should accept data up to 10 minutes late. The service should also perform some transformations before writing the data to a database. Which of the managed services would be the best option for managing late arriving data and performing transformations? A. Cloud Dataproc B. Cloud Dataflow C. Cloud Dataprep D. Cloud SQL - Answer-B. The correct answer is B. Cloud Dataflow is a stream and batch processing service that is used for transforming data and processing streaming data. Option A, Cloud Dataproc, is a managed Hadoop and Spark service and not as well suited as Cloud Dataflow for the kind of stream processing specified. Option C, Cloud Dataprep, is an interactive tool for exploring and preparing data sets for analysis. Option D, Cloud SQL, is a relational database service, so it may be used to store data, but it is not a service specifically for ingesting and transforming data before writing to a database. /.A team of analysts has collected several CSV datasets with a total size of 50 GB. They plan to store the datasets in GCP and use Compute Engine instances to run RStudio, an interactive stats application. Data will be loaded into RStudio using an RStudio data loading tool. Which of the following is the most appropriate GCP storage service for the datasets? A. Cloud Storage B. Cloud Datastore C. MongoDB D. Bigtable - Answer-A. The correct answer is A, Cloud Storage, because the data in the files is treated as an atomic unit of data that is loaded into RStudio. Options B and C are incorrect because those are document databases and there is no requirement for storing the data in semistructured format with support for fully indexed querying. Also, MongoDB is not a GCP service. Option D is incorrect because, although you could load CSV data into a Bigtable table, the volume of data is not sufficient to warrant using Bigtable. /.A team of analysts has collected several terabytes of telemetry data in CSV datasets. They plan to store the datasets in GCP and query and analyze the data using SQL. Which of the following is the most appropriate GCP storage service for the datasets? A. Cloud SQL B. Cloud Spanner C. BigQuery D. Bigtable - Answer-C. The correct answer is C, BigQuery, which is a managed analytical database service that supports SQL and scales to petabyte volumes of data. Options A and B are incorrect because both are used for transaction processing applications, not analytics. Option D is incorrect because Bigtable does not support SQL. /.You have been hired to consult with a startup that is developing software for self-driving vehicles. The company's product uses machine learning to predict the trajectory of persons and vehicles. Currently, the software is being developed using 20 vehicles, all located in the same city. IoT data is sent from vehicles every 60 seconds to a MySQL database running on a Compute Engine instance using an n2-standard-8 machine type with 8 vCPUs and 16 GB of memory. The startup wants to review their architecture and make any necessary changes to support tens of thousands of self-driving vehicles, all transmitting IoT data every second. The vehicles will be located across North America and Europe. Approximately 4 KB of data is sent in each transmission. What changes to the architecture would you recommend? A. None. The current architecture is well suited to the use case. B. Replace Cloud SQL with Cloud Spanner. C. Replace Cl - Answer-C. The correct answer is C. Bigtable is the best storage service for IoT data, especially when a large number of devices will be sending data at short intervals. Option A is incorrect, because Cloud SQL is designed for transaction processing at a regional level. Option B is incorrect because Cloud Spanner is designed for transaction processing, and although it scales to global levels, it is not the best option for IoT data. Option D is incorrect because there is no need for indexed, semi-structured data. /.As a member of a team of game developers, you have been tasked with devising a way to track players' possessions. Possessions may be purchased from a catalog, traded with other players, or awarded for game activities. Possessions are categorized as clothing, tools, books, and coins. Players may have any number of possessions of any type. Players can search for other players who have particular possession types to facilitate trading. The game designer has informed you that there will likely be new types of possessions and ways to acquire them in the future. What kind of a data store would you recommend using? A. Transactional database B. Wide-column database C. Document database D. Analytic database - Answer-C. The correct answer is C because the requirements call for a semi-structured schema. You will need to search players' possessions and not just look them up using a single key because of the requirement for facilitating trading. Option A is not correct. Transactional databases have fixed schemas, and this use case calls for a semi-structured schema. Option B is incorrect because it does not support indexed lookup, which is needed for searching. Option D is incorrect. Analytical databases are structured data stores. /.The CTO of your company wants to reduce the cost of running an HBase and Hadoop cluster on premises. Only one HBase application is run on the cluster. The cluster currently supports 10 TB of data, but it is expected to double in the next six months. Which of the following managed services would you recommend to replace the on-premises cluster in order to minimize migration and ongoing operational costs? A. Cloud Bigtable using the HBase API B. Cloud Dataflow using the HBase API C. Cloud Spanner D. Cloud Datastore - Answer-A. The correct answer is A. Cloud Bigtable using the HBase API would minimize migration efforts, and since Bigtable is a managed service, it would help reduce operational costs. Option B is incorrect. Cloud Dataflow is a stream and batch processing service, not a database. Options C and D are incorrect. Relational databases are not likely to be appropriate choices for an HBase database, which is a wide-column NoSQL database, and trying to migrate from a wide-column to a relational database would incur unnecessary costs. /.A genomics research institute is developing a platform for analyzing data related to genetic diseases. The genomics data is in a specialized format known as FASTQ, which stores nucleotide sequences and quality scores in a text format. Files may be up to 400 GB and are uploaded in batches. Once the files finish uploading, an analysis pipeline runs, reads the data in the FASTQ file, and outputs data to a database. What storage system is a good option for storing the uploaded FASTQ data? A. Cloud Bigtable B. Cloud Datastore C. Cloud Storage D. Cloud Spanner - Answer-C. The correct answer is C because the FASTQ files are unstructured since their internal format is not used to organize storage structures. Also, 400 GB is large enough that it is not efficient to store them as objects in a database. Options A and B are incorrect because a NoSQL database is not needed for the given requirements. Similarly, there is no need to store the data in a structured database like Cloud Spanner, so Option D is incorrect. /.A genomics research institute is developing a platform for analyzing data related to genetic diseases. The genomics data is in a specialized format known as FASTQ, which stores nucleotide sequences and quality scores in a text format. Once the files finish uploading, an analysis pipeline runs, reads the data in the FASTQ file, and outputs data to a database. The output is in tabular structure, the data is queried using SQL, and typically queries retrieve only a small number of columns but many rows. What database would you recommend for storing the output of the workflow? A. Cloud Bigtable B. Cloud Datastore C. Cloud Storage D. BigQuery - Answer-D. The correct answer is D because the output is structured, will be queried with SQL, and will retrieve a large number of rows but few columns, making this a good use case for columnar storage, which BigQuery uses. Options A and B are not good options because neither database supports SQL. Option C is incorrect because Cloud Storage is used for unstructured data and does not support querying the contents of objects. /.You are developing a new application and will be storing semi-structured data that will only be accessed by a single key. The total volume of data will be at least 40 TB. What GCP database service would you use? A. BigQuery B. Bigtable C. Cloud Spanner D. Cloud SQL - Answer-B. The correct answer is B. Bigtable is a wide-column NoSQL database that supports semistructured data and works well with datasets over 1 TB. Options A, D, and C are incorrect because they all are used for structured data. Option D is also incorrect because Cloud SQL does not currently scale to 40 TB in a single database. /.A group of climate scientists is collecting weather data every minute from 10,000 sensors across the globe. Data often arrives near the beginning of a minute, and almost all data arrives within the first 30 seconds of a minute. The data ingestion process is losing some data because servers cannot ingest the data as fast as it is arriving. The scientists have scaled up the number of servers in their managed instance group, but that has not completely eliminated the problem. They do not wish to increase the maximum size of the managed instance group. What else can the scientists do to prevent data loss? A. Write data to a Cloud Dataflow stream B. Write data to a Cloud Pub/Sub topic C. Write data to Cloud SQL table D. Write data to Cloud Dataprep - Answer-B. The correct answer is B, write data to a Cloud Pub/Subtopic, which can scale automatically to existing workloads. The ingestion process can read data from the topic and data and then process it. Some data will likely accumulate early in every minute, but the ingestion process can catch up later in the minute after new data stops arriving. Option A is incorrect; Cloud Dataflow is a batch and stream processing service—it is not a message queue for buffering data. Option C is incorrect; Cloud SQL is not designed to scale for ingestion as needed in this example. Option D is incorrect; Cloud Dataprep is a tool for cleaning and preparing datasets for analysis. /.A software developer asks your advice about storing data. The developer has hundreds of thousands of 1 KB JSON objects that need to be accessed in sub-millisecond times if possible. All objects are referenced by a key. There is no need to look up values by the contents of the JSON structure. What kind of NoSQL database would you recommend? A. Key-value database B. Analytical database C. Wide-column database D. Graph database - Answer-A. The correct answer is A. This is a good use case for key-value databases because the value is looked up by key only and the value is a JSON structure. Option B is incorrect. Analytical databases are not a type of NoSQL database. Option C is not a good option because wide-column databases work well with larger databases, typically in the terabyte range. Option D is incorrect because the data is not modeled as nodes and links, such as a network model. /.A software developer asks your advice about storing data. The developer has hundreds of thousands of 10 KB JSON objects that need to be searchable by most attributes in the JSON structure. What kind of NoSQL database would you recommend? A. Key-value database B. Analytical database C. Wide-column database

Show more Read less

Institution

GCP Professional Data Engineer Certification

Course

GCP Professional Data Engineer Certification

Content preview

GCP Professional Data Engineer
Certification Exam Questions With
100% Pass

/. A developer is planning a mobile application for your company's customers to use to
track information about their accounts. The developer is asking for your advice on
storage technologies. In one case, the developer explains that they want to write
messages each time a significant event occurs, such as the client opening, viewing, or
deleting an account. This data is collected for compliance reasons, and the developer
wants to minimize administrative overhead. What system would you recommend for
storing this data?

A. Cloud SQL using MySQL
B. Cloud SQL using PostgreSQL
C. Cloud Datastore
D. Stackdriver Logging - Answer-D. The correct answer is D. Stackdriver Logging is the
best option because it is a managed service designed for storing logging data. Neither
Option A nor B is as good a fit because the developer would have to design and
maintain a relational data model and user interface to view and manage log data.
Option C, Cloud Datastore, would not require a fixed data model, but it would still
require the developer to create and maintain a user interface to manage log events.

/.You are responsible for developing an ingestion mechanism for a large number of IoT
sensors. The ingestion service should accept data up to 10 minutes late. The service
should also perform some transformations before writing the data to a database. Which
of the managed services would be the best option for managing late arriving data and
performing transformations?

A. Cloud Dataproc
B. Cloud Dataflow
C. Cloud Dataprep
D. Cloud SQL - Answer-B. The correct answer is B. Cloud Dataflow is a stream and
batch processing service that is used for transforming data and processing streaming
data. Option A, Cloud Dataproc, is a managed Hadoop and Spark service and not as
well suited as Cloud Dataflow for the kind of stream processing specified. Option C,
Cloud Dataprep, is an interactive tool for exploring and preparing data sets for analysis.
Option D, Cloud SQL, is a relational database service, so it may be used to store data,
but it is not a service specifically for ingesting and transforming data before writing to a
database.

,/.A team of analysts has collected several CSV datasets with a total size of 50 GB. They
plan to store the datasets in GCP and use Compute Engine instances to run RStudio,
an interactive stats application. Data will be loaded into RStudio using an RStudio data
loading tool. Which of the following is the most appropriate GCP storage service for the
datasets?

A. Cloud Storage
B. Cloud Datastore
C. MongoDB
D. Bigtable - Answer-A. The correct answer is A, Cloud Storage, because the data in
the files is treated as an atomic unit of data that is loaded into RStudio. Options B and C
are incorrect because those are document databases and there is no requirement for
storing the data in semistructured format with support for fully indexed querying. Also,
MongoDB is not a GCP service. Option D is incorrect because, although you could load
CSV data into a Bigtable table, the volume of data is not sufficient to warrant using
Bigtable.

/.A team of analysts has collected several terabytes of telemetry data in CSV datasets.
They plan to store the datasets in GCP and query and analyze the data using SQL.
Which of the following is the most appropriate GCP storage service for the datasets?

A. Cloud SQL
B. Cloud Spanner
C. BigQuery
D. Bigtable - Answer-C. The correct answer is C, BigQuery, which is a managed
analytical database service that supports SQL and scales to petabyte volumes of data.
Options A and B are incorrect because both are used for transaction processing
applications, not analytics. Option D is incorrect because Bigtable does not support
SQL.

/.You have been hired to consult with a startup that is developing software for self-
driving vehicles. The company's product uses machine learning to predict the trajectory
of persons and vehicles. Currently, the software is being developed using 20 vehicles,
all located in the same city. IoT data is sent from vehicles every 60 seconds to a MySQL
database running on a Compute Engine instance using an n2-standard-8 machine type
with 8 vCPUs and 16 GB of memory. The startup wants to review their architecture and
make any necessary changes to support tens of thousands of self-driving vehicles, all
transmitting IoT data every second. The vehicles will be located across North America
and Europe. Approximately 4 KB of data is sent in each transmission. What changes to
the architecture would you recommend?

A. None. The current architecture is well suited to the use case.
B. Replace Cloud SQL with Cloud Spanner.
C. Replace Cl - Answer-C. The correct answer is C. Bigtable is the best storage service
for IoT data, especially when a large number of devices will be sending data at short
intervals. Option A is incorrect, because Cloud SQL is designed for transaction

,processing at a regional level. Option B is incorrect because Cloud Spanner is designed
for transaction processing, and although it scales to global levels, it is not the best
option for IoT data. Option D is incorrect because there is no need for indexed, semi-
structured data.

/.As a member of a team of game developers, you have been tasked with devising a
way to track players' possessions. Possessions may be purchased from a catalog,
traded with other players, or awarded for game activities. Possessions are categorized
as clothing, tools, books, and coins. Players may have any number of possessions of
any type. Players can search for other players who have particular possession types to
facilitate trading. The game designer has informed you that there will likely be new types
of possessions and ways to acquire them in the future. What kind of a data store would
you recommend using?

A. Transactional database
B. Wide-column database
C. Document database
D. Analytic database - Answer-C. The correct answer is C because the requirements
call for a semi-structured schema. You will need to search players' possessions and not
just look them up using a single key because of the requirement for facilitating trading.
Option A is not correct. Transactional databases have fixed schemas, and this use case
calls for a semi-structured schema. Option B is incorrect because it does not support
indexed lookup, which is needed for searching. Option D is incorrect. Analytical
databases are structured data stores.

/.The CTO of your company wants to reduce the cost of running an HBase and Hadoop
cluster on premises. Only one HBase application is run on the cluster. The cluster
currently supports 10 TB of data, but it is expected to double in the next six months.
Which of the
following managed services would you recommend to replace the on-premises cluster
in order to minimize migration and ongoing operational costs?

A. Cloud Bigtable using the HBase API
B. Cloud Dataflow using the HBase API
C. Cloud Spanner
D. Cloud Datastore - Answer-A. The correct answer is A. Cloud Bigtable using the
HBase API would minimize migration efforts, and since Bigtable is a managed service, it
would help reduce operational costs. Option B is incorrect. Cloud Dataflow is a stream
and batch processing service, not a database. Options C and D are incorrect. Relational
databases are not likely to be appropriate choices for an HBase database, which is a
wide-column NoSQL database, and trying to migrate from a wide-column to a relational
database would incur unnecessary costs.

/.A genomics research institute is developing a platform for analyzing data related to
genetic diseases. The genomics data is in a specialized format known as FASTQ, which
stores nucleotide sequences and quality scores in a text format. Files may be up to 400

, GB and are uploaded in batches. Once the files finish uploading, an analysis pipeline
runs, reads the data in the FASTQ file, and outputs data to a database. What storage
system is a good option for storing the uploaded FASTQ data?

A. Cloud Bigtable
B. Cloud Datastore
C. Cloud Storage
D. Cloud Spanner - Answer-C. The correct answer is C because the FASTQ files are
unstructured since their internal format is not used to organize storage structures. Also,
400 GB is large enough that it is not efficient to store them as objects in a database.
Options A and B are incorrect because a NoSQL database is not needed for the given
requirements. Similarly, there is no need to store the data in a structured database like
Cloud Spanner, so Option D is incorrect.

/.A genomics research institute is developing a platform for analyzing data related to
genetic diseases. The genomics data is in a specialized format known as FASTQ, which
stores nucleotide sequences and quality scores in a text format. Once the files finish
uploading, an analysis pipeline runs, reads the data in the FASTQ file, and outputs data
to a database. The output is in tabular structure, the data is queried using SQL, and
typically queries retrieve only a small number of columns but many rows. What
database would you recommend for storing the output of the workflow?

A. Cloud Bigtable
B. Cloud Datastore
C. Cloud Storage
D. BigQuery - Answer-D. The correct answer is D because the output is structured, will
be queried with SQL, and will retrieve a large number of rows but few columns, making
this a good use case for columnar storage, which BigQuery uses. Options A and B are
not good options because neither database supports SQL. Option C is incorrect
because Cloud Storage is used for unstructured data and does not support querying the
contents of objects.

/.You are developing a new application and will be storing semi-structured data that will
only be accessed by a single key. The total volume of data will be at least 40 TB. What
GCP database service would you use?

A. BigQuery
B. Bigtable
C. Cloud Spanner
D. Cloud SQL - Answer-B. The correct answer is B. Bigtable is a wide-column NoSQL
database that supports semistructured data and works well with datasets over 1 TB.
Options A, D, and C are incorrect because they all are used for structured data. Option
D is also incorrect because Cloud SQL does not currently scale to 40 TB in a single
database.

Report Copyright Violation

Written for

Institution: GCP Professional Data Engineer Certification
Course: GCP Professional Data Engineer Certification

Document information

Uploaded on: August 29, 2025
Number of pages: 107
Written in: 2025/2026
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

gcp professional data engineer
gcp professional data engineer certification
exam questions with 100 pass

$22.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

Brainariam

3.3

(25)

Also available in package deal

Get to know the seller

Brainariam Harvard University

View profile

Sold

148

Member since

1 year

Number of followers

Documents

8376

Last sold

5 days ago

Our store offers a wide selection of materials on various subjects and difficulty levels, created by experienced teachers. We specialize on NURSING,WGU,ACLS USMLE,TNCC,PMHNP,ATI and other major courses, Updated Exam, Study Guides and Test banks. If you don't find any document you are looking for in this store contact us and we will fetch it for you in minutes, we love impressing our clients with our quality work and we are very punctual on deadlines. Please go through the sets description appropriately before any purchase and leave a review after purchasing so as to make sure our customers are 100% satisfied. I WISH YOU SUCCESS IN YOUR EDUCATION JOURNEY

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Brainariam. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $22.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 52675 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

GCP Professional Data Engineer Certification Exam Questions With 100% Pass

Content preview

Written for

Document information

Subjects

Also available in package deal

Get to know the seller

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?