GCP Professional Data Engineer Questions with Detailed
Verified Answers
1. You are developing an application that will only recognize and tag specific business to business
product logos in images. You do not have an extensive background working with machine learning
models, but need to get your application working. What is the current best method to accomplish
this task? Ans: ✓ ✓ ✓ a. Use the AutoML Vision service to train a custom model using the Vision
API
i. The newly added AutoML services allow you to train custom image (and other models) using the
Google's pre-trained API's as a base. Training a custom model also works on AI Platform, but this
route requires less manual model overhead.
2. Your organization is streaming telemetry data into BigQuery for long-term storage (2 years) and
analysis, at the rate of about 100 million records per day. They need to be able to run queries
against certain time periods of data without incurring the costs of querying all available records.
What is the preferred method for doing so? Ans: ✓ ✓ ✓ a. Partition a single table by day, and run
queries against individual partitions.
i. Partitioning a single table by date allows you to maintain a single table, but also be able to run
queries on a smaller portion of it. While using wildcards across multiple tables (one for each day)
technically works, partitioning a single table is best practice.
3. You are an administrator for several organizations in the same company. Each organization has
data in their own BigQuery table within a single project. For application access reasons, all of the
tables must remain in the same project. You think each organization should be able to view and run
queries against their own data without exposing the data of organizations to unauthorized viewers.
What should you recommend? Ans: ✓ ✓ ✓ a. Create a separate dataset for each organization in
the same project. Place each organization's table in each dataset. Restrict access to the
organization's dataset to only that company, from which they can view their table but no one else's.
i. You can assign roles at the dataset level. Placing tables in different datasets allows you to limit
access per dataset.
© Get it right 2025 Getaway - Stuvia US All rights reserved
, Click here for more: Scholars nexus
4. Your company is making the move to Google Cloud and has chosen to use a managed database
service to reduce overhead. Your existing database is used for a product catalog that provides real-
time inventory tracking for a retailer. Your database is 500 GB in size. The data is semi-structured
and does not need full atomicity. You are looking for a truly no-ops/serverless solution. What
storage option should you choose? Ans: ✓ ✓ ✓ a. Cloud Datastore
i. Datastore is perfect for semi-structured data less than 1TB in size. Product catalogs are a
recommended use case.
5. How can you set up your Dataproc environment to use BigQuery as an input and output source?
Ans: ✓ ✓ ✓ a. Install the BigQuery connector on your Dataproc cluster.
i. You can install the BigQuery connector to your cluster for direct programmatic read/write access
to BigQuery. Note that a Cloud Storage bucket is used between the two services, but you'll interact
directly with BigQuery from Dataproc.
6. In AI Platform, what does the CUSTOM tier allow you to configure? Choose the best answer. Ans:
✓ ✓ ✓ a. Custom number of workers and parameter servers. Machine type of master server
i. Correct. You can customize the number of workers and parameter servers, but masters are set to
one.
7. You are building a data pipeline on Google Cloud. You need to prepare source data for a machine-
learning model. This involves quickly deduplicating rows from three input tables and also removing
outliers from data columns where you do not know the data distribution. What should you do?
Ans: ✓ ✓ ✓ a. Use Cloud Dataprep to preview the data distributions in sample source data table
columns. Click on each column name, click on each appropriate suggested transformation, and then
click Add to add each transformation to the Cloud Dataprep job.
i. Dataprep is the correct choice because of the requirements to prepare/clean source data. For
deduplication, using the suggestion transformation would be easier and quicker than writing a
recipe, which is more work than needed.
8. As part of your backup plan, you create regular boot-disk snapshots of Compute Engine instances
that are running. You want to be able to restore these snapshots using the fewest possible steps for
replacement instances. What should you do? Ans: ✓ ✓ ✓ a. Use the snapshots to create
replacement instances as needed.
i. Snapshots let you recreate instances in the fewest steps.
9. You are setting up multiple MySQL databases on Compute Engine. You need to collect logs from
your MySQL applications for audit purposes. How should you approach this? Ans: ✓ ✓ ✓ a. Install
© Get it right 2025 Getaway - Stuvia US All rights reserved