CERTIFICATION EXAM QUESTIONS WITH
CORRECT ANSWERS NEWLY MODIFIED TESTED
AND APPROVED!!!
The CTO of your company wants to reduce the cost of running an HBase and Hadoop cluster
on premises. Only one HBase application is run on the cluster. The cluster currently supports
10 TB of data, but it is expected to double in the next six months. Which of the
following managed services would you recommend to replace the on-premises cluster in
order to minimize migration and ongoing operational costs?
A. Cloud Bigtable using the HBase API
B. Cloud Dataflow using the HBase API
C. Cloud Spanner
D. Cloud Datastore --CORRECT ANSWER--A. The correct answer is A. Cloud Bigtable
using the HBase API would minimize migration efforts, and since Bigtable is a managed
service, it would help reduce operational costs. Option B is incorrect. Cloud Dataflow is a
stream and batch processing service, not a database. Options C and D are incorrect.
Relational databases are not likely to be appropriate choices for an HBase database, which is
a wide-column NoSQL database, and trying to migrate from a wide-column to a relational
database would incur unnecessary costs.
A genomics research institute is developing a platform for analyzing data related to genetic
diseases. The genomics data is in a specialized format known as FASTQ, which stores
nucleotide sequences and quality scores in a text format. Files may be up to 400 GB and are
uploaded in batches. Once the files finish uploading, an analysis pipeline runs, reads the data
in the FASTQ file, and outputs data to a database. What storage system is a good option for
storing the uploaded FASTQ data?
Page 1 of 172
,A. Cloud Bigtable
B. Cloud Datastore
C. Cloud Storage
D. Cloud Spanner --CORRECT ANSWER--C. The correct answer is C because the FASTQ
files are unstructured since their internal format is not used to organize storage structures.
Also, 400 GB is large enough that it is not efficient to store them as objects in a database.
Options A and B are incorrect because a NoSQL database is not needed for the given
requirements. Similarly, there is no need to store the data in a structured database like Cloud
Spanner, so Option D is incorrect.
A genomics research institute is developing a platform for analyzing data related to genetic
diseases. The genomics data is in a specialized format known as FASTQ, which stores
nucleotide sequences and quality scores in a text format. Once the files finish uploading, an
analysis pipeline runs, reads the data in the FASTQ file, and outputs data to a database. The
output is in tabular structure, the data is queried using SQL, and typically queries retrieve
only a small number of columns but many rows. What database would you recommend for
storing the output of the workflow?
A. Cloud Bigtable
B. Cloud Datastore
C. Cloud Storage
D. BigQuery --CORRECT ANSWER--D. The correct answer is D because the output is
structured, will be queried with SQL, and will retrieve a large number of rows but few
columns, making this a good use case for columnar storage, which BigQuery uses. Options A
and B are not good options because neither database supports SQL. Option C is incorrect
because Cloud Storage is used for unstructured data and does not support querying the
contents of objects.
Page 2 of 172
,You are developing a new application and will be storing semi-structured data that will only
be accessed by a single key. The total volume of data will be at least 40 TB. What GCP
database service would you use?
A. BigQuery
B. Bigtable
C. Cloud Spanner
D. Cloud SQL --CORRECT ANSWER--B. The correct answer is B. Bigtable is a wide-
column NoSQL database that supports semistructured data and works well with datasets over
1 TB. Options A, D, and C are incorrect because they all are used for structured data. Option
D is also incorrect because Cloud SQL does not currently scale to 40 TB in a single database.
A developer is planning a mobile application for your company's customers to use to track
information about their accounts. The developer is asking for your advice on storage
technologies. In one case, the developer explains that they want to write messages each time
a significant event occurs, such as the client opening, viewing, or deleting an account. This
data is collected for compliance reasons, and the developer wants to minimize administrative
overhead. What system would you recommend for storing this data?
A. Cloud SQL using MySQL
B. Cloud SQL using PostgreSQL
C. Cloud Datastore
D. Stackdriver Logging --CORRECT ANSWER--D. The correct answer is D. Stackdriver
Logging is the best option because it is a managed service designed for storing logging data.
Neither Option A nor B is as good a fit because the developer would have to design and
maintain a relational data model and user interface to view and manage log data. Option C,
Cloud Datastore, would not require a fixed data model, but it would still require the
developer to create and maintain a user interface to manage log events.
Page 3 of 172
, You are responsible for developing an ingestion mechanism for a large number of IoT
sensors. The ingestion service should accept data up to 10 minutes late. The service should
also perform some transformations before writing the data to a database. Which of the
managed services would be the best option for managing late arriving data and performing
transformations?
A. Cloud Dataproc
B. Cloud Dataflow
C. Cloud Dataprep
D. Cloud SQL --CORRECT ANSWER--B. The correct answer is B. Cloud Dataflow is a
stream and batch processing service that is used for transforming data and processing
streaming data. Option A, Cloud Dataproc, is a managed Hadoop and Spark service and not
as well suited as Cloud Dataflow for the kind of stream processing specified. Option C,
Cloud Dataprep, is an interactive tool for exploring and preparing data sets for analysis.
Option D, Cloud SQL, is a relational database service, so it may be used to store data, but it is
not a service specifically for ingesting and transforming data before writing to a database.
A team of analysts has collected several CSV datasets with a total size of 50 GB. They plan
to store the datasets in GCP and use Compute Engine instances to run RStudio, an interactive
stats application. Data will be loaded into RStudio using an RStudio data loading tool. Which
of the following is the most appropriate GCP storage service for the datasets?
A. Cloud Storage
B. Cloud Datastore
C. MongoDB
D. Bigtable --CORRECT ANSWER--A. The correct answer is A, Cloud Storage, because the
data in the files is treated as an atomic unit of data that is loaded into RStudio. Options B and
C are incorrect because those are document databases and there is no requirement for storing
Page 4 of 172