ANSWERS | VERIFIED ANSWERS (100% CORRECT) | LATEST EXAM
UPDATE
1. You are developing an application that will only recognize and tag
specific business to business product logos in images. You do not have an
extensive background working with machine learning models, but need to
get your application working. What is the current best method to
accomplish this task?
a. Use the AutoML Vision service to train a custom model using the Vision
API
i. The newly added AutoML services allow you to train custom image (and
other models) using the Google's pre-trained API's as a base. Training a
custom model also works on AI Platform, but this route requires less
manual model overhead.
2. Your organization is streaming telemetry data into BigQuery for long-
term storage (2 years) and analysis, at the rate of about 100 million records
per day. They need to be able to run queries against certain time periods of
data without incurring the costs of querying all available records. What is
the preferred method for doing so?
a. Partition a single table by day, and run queries against individual
partitions.
i. Partitioning a single table by date allows you to maintain a single table,
but also be able to run queries on a smaller portion of it. While using
wildcards across multiple tables (one for each day) technically works,
partitioning a single table is best practice.
3. You are an administrator for several organizations in the same company.
Each organization has data in their own BigQuery table within a single
project. For application access reasons, all of the tables must remain in the
same project. You think each organization should be able to view and run
queries against their own data without exposing the data of organizations
to unauthorized viewers. What should you recommend?
a. Create a separate dataset for each organization in the same project.
Place each organization's table in each dataset. Restrict access to the
organization's dataset to only that company, from which they can view
their table but no one else's.
,i. You can assign roles at the dataset level. Placing tables in different
datasets allows you to limit access per dataset.
4. Your company is making the move to Google Cloud and has chosen to
use a managed database service to reduce overhead. Your existing database
is used for a product catalog that provides real-time inventory tracking for
a retailer. Your database is 500 GB in size. The data is semi-structured and
does not need full atomicity. You are looking for a truly no-ops/serverless
solution. What storage option should you choose?
a. Cloud Datastore
i. Datastore is perfect for semi-structured data less than 1TB in size.
Product catalogs are a recommended use case.
5. How can you set up your Dataproc environment to use BigQuery as an
input and output source?
a. Install the BigQuery connector on your Dataproc cluster.
i. You can install the BigQuery connector to your cluster for direct
programmatic read/write access to BigQuery. Note that a Cloud Storage
bucket is used between the two services, but you'll interact directly with
BigQuery from Dataproc.
6. In AI Platform, what does the CUSTOM tier allow you to configure?
Choose the best answer.
a. Custom number of workers and parameter servers. Machine type of
master server
i. Correct. You can customize the number of workers and parameter
servers, but masters are set to one.
7. You are building a data pipeline on Google Cloud. You need to prepare
source data for a machine-learning model. This involves quickly
deduplicating rows from three input tables and also removing outliers from
data columns where you do not know the data distribution. What should
you do?
a. Use Cloud Dataprep to preview the data distributions in sample source
data table columns. Click on each column name, click on each appropriate
suggested transformation, and then click Add to add each transformation
to the Cloud Dataprep job.
i. Dataprep is the correct choice because of the requirements to
prepare/clean source data. For deduplication, using the suggestion
, transformation would be easier and quicker than writing a recipe, which is
more work than needed.
8. As part of your backup plan, you create regular boot-disk snapshots of
Compute Engine instances that are running. You want to be able to restore
these snapshots using the fewest possible steps for replacement instances.
What should you do?
a. Use the snapshots to create replacement instances as needed.
i. Snapshots let you recreate instances in the fewest steps.
9. You are setting up multiple MySQL databases on Compute Engine. You
need to collect logs from your MySQL applications for audit purposes. How
should you approach this?
a. Install the Stackdriver Logging agent on your database instances and
configure the fluentd plugin to read and export your MySQL logs into
Stackdriver Logging.
i. The Stackdriver Logging agent requires the fluentd plugin to be
configured to read logs from your database application.
10. Which of these statements do not apply to preemptible worker nodes on
Cloud Dataproc? Choose two answers.
a. You must have a max of 2:1 ratio of preemptible to standard workers.
i. There is no ratio requirement, but be aware that preemptible workers
can be reclaimed at any time, and you will want a number of standard
workers that are always persistent.
b. Your cluster can be created with only preemptible workers
i. You must have at least one standard worker in a cluster.
11. You need to deploy a TensorFlow machine-learning model to Google
Cloud. You want to maximize the speed and minimize the cost of model
prediction and deployment. What should you do?
a. Export your trained model to a SavedModel format. Deploy and run
your model on Cloud ML Engine.
i. This is the preferred method to fulfill the requirement to minimize costs.
12. Your organization needs to be able to reliably handle ever-increasing
amounts of streaming telemetry data, process it, and economically store
analyzed data. What services should they use for this task?