GCP Professional Data Engineer (A+ Guaranteed)
1. You are developing an application that will only recognize and tag specific business to business product logos in images. You do not have an extensive background working with machine learning models, but need to get your application working. What is the current best method to accomplish this task? correct answers a. Use the AutoML Vision service to train a custom model using the Vision API i. The newly added AutoML services allow you to train custom image (and other models) using the Google's pre-trained API's as a base. Training a custom model also works on AI Platform, but this route requires less manual model overhead. 2. Your organization is streaming telemetry data into BigQuery for long-term storage (2 years) and analysis, at the rate of about 100 million records per day. They need to be able to run queries against certain time periods of data without incurring the costs of querying all available records. What is the preferred method for doing so? correct answers a. Partition a single table by day, and run queries against individual partitions. i. Partitioning a single table by date allows you to maintain a single table, but also be able to run queries on a smaller portion of it. While using wildcards across multiple tables (one for each day) technically works, partitioning a single table is best practice. 3. You are an administrator for several organizations in the same company. Each organization has data in their own BigQuery table within a single project. For application access reasons, all of the tables must remain in the same project. You think each organization should be able to view and run queries against their own data without exposing the data of organizations to unauthorized viewers. What should you recommend? correct answers a. Create a separate dataset for each organization in the same project. Place each organization's table in each dataset. Restrict access to the organization's dataset to only that company, from which they can view their table but no one else's. i. You can assign roles at the dataset level. Placing tables in different datasets allows you to limit access per dataset. 4. Your company is making the move to Google Cloud and has chosen to use a managed database service to reduce overhead. Your existing database is used for a product catalog that provides real-time inventory tracking for a retailer. Your database is 500 GB in size. The data is semi-structured and does not need full atomicity. You are looking for a truly no-ops/serverless solution. What storage option should you choose? correct answers a. Cloud Datastore i. Datastore is perfect for semi-structured data less than 1TB in size. Product catalogs are a recommended use case. 5. How can you set up your Dataproc environment to use BigQuery as an input and output source? correct answers a. Install the BigQuery connector on your Dataproc cluster. i. You can install the BigQuery connector to your cluster for direct programmatic read/write access to BigQuery. Note that a Cloud Storage bucket is used between the two services, but you'll interact directly with BigQuery from Dataproc.
Geschreven voor
- Instelling
- GCP Professional Data Engineer
- Vak
- GCP Professional Data Engineer
Documentinformatie
- Geüpload op
- 2 juli 2023
- Aantal pagina's
- 11
- Geschreven in
- 2022/2023
- Type
- Tentamen (uitwerkingen)
- Bevat
- Vragen en antwoorden
Onderwerpen
-
1 you are developing an application that will onl
Ook beschikbaar in voordeelbundel