QUESTIONS AND ANSWERS GUARANTEE A+
✔✔Question 20
A junior data engineer has ingested a JSON file into a table raw_table with the following
schema:
cart_id STRING,
items ARRAY<item_id:STRING>
The junior data engineer would like to unnest the items column in raw_table to result in
a new table with the following schema:
cart_id STRING, item_id STRING
Which of the following commands should the junior data engineer run to complete this
task?
A. SELECT cart_id, filter(items) AS item_id FROM raw_table;
B. SELECT cart_id, flatten(items) AS item_id FROM raw_table;
C. SELECT cart_id, reduce(items) AS item_id FROM raw_table;
D. SELECT cart_id, explode(items) AS item_id FROM raw_table;
E. SELECT cart_id, slice(items) AS item_id FROM raw_table; - ✔✔D. SELECT cart_id,
explode(items) AS item_id FROM raw_table;
✔✔Question 21
A data engineer has ingested a JSON file into a table raw_table with the following
schema: transaction_id STRING,
payload ARRAY<customer_id:STRING, date:TIMESTAMP, store_id:STRING>
The data engineer wants to efficiently extract the date of each transaction into a table
with the following schema:
transaction_id STRING, date TIMESTAMP
Which of the following commands should the data engineer run to complete this task?
A. SELECT transaction_id, explode(payload) FROM raw_table;
B. SELECT transaction_id, payload.date FROM raw_table;
C. SELECT transaction_id, date FROM raw_table;
D. SELECT transaction_id, payload[date] FROM raw_table;
E. SELECT transaction_id, date from payload FROM raw_table; - ✔✔B. SELECT
transaction_id, payload.date FROM raw_table;
✔✔Question 22
A data analyst has provided a data engineering team with the following Spark SQL
query:
SELECT district, avg(sales)
FROM store_sales_20220101 GROUP BY district;
The data analyst would like the data engineering team to run this query every day. The
date at the end of the table name (20220101) should automatically be replaced with the
current date each time the query is run.
Which of the following approaches could be used by the data engineering team to
efficiently automate this process?
, A. They could wrap the query using PySpark and use Python's string variable system to
automatically update the table name.
B. They could manually replace the date within the table name with the current day's
date.
C. They could request that the data analyst rewrites the query to be run less frequently.
D. They could replace the string-formatted date in the table with a timestamp-formatted
date.
E. They could pass the table - ✔✔A. They could wrap the query using PySpark and use
Python's string variable system to automatically update the table name.
✔✔Question 23
A data engineer has ingested data from an external source into a PySpark DataFrame
raw_df. They need to briefly make this data available in SQL for a data analyst to
perform a quality assurance check on the data.
Which of the following commands should the data engineer run to make this data
available in SQL for only the remainder of the Spark session?
A. raw_df.createOrReplaceTempView("raw_df") B. raw_df.createTable("raw_df")
C. raw_df.write.save("raw_df")
D. raw_df.saveAsTable("raw_df")
E. There is no way to share data between PySpark and SQL. - ✔✔A.
raw_df.createOrReplaceTempView("raw_df") B. raw_df.createTable("raw_df")
✔✔Question 24
A data engineer needs to dynamically create a table name string using three Python
variables: region, store, and year. An example of a table name is below when region =
"nyc", store = "100", and year = "2021":
nyc100_sales_2021
Which of the following commands should the data engineer use to construct the table
name in Python?
A. "{region}+{store}+_sales_+{year}"
B. f"{region}+{store}+_sales_+{year}"
C. "{region}{store}_sales_{year}"
D. f"{region}{store}_sales_{year}"
E. {region}+{store}+"_sales_"+{year} - ✔✔D. f"{region}{store}_sales_{year}"
✔✔Question 25
A data engineer has developed a code block to perform a streaming read on a data
source. The code block is below:
(spark .read
.schema(schema)
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(dataSource)
)
The code block is returning an error.