Unlimited learning tools are available now. Upgrade
Certified Data Engineer Professional
Leave the first rating
Save
Students also studied
Flashcard sets Practice tests
Everything Google 70-742 8- CMIS 1200 Module 2 :) Chapte
terms 7 terms terms
AndrewMakesStudy... Preview kingtie Preview amandamercurius Preview sta
C
Terms in this set (54)
An upstream system has been configured to pass the date for a given batch of data
to the Databricks Jobs API as a parameter. The notebook to be scheduled will use
this parameter to load data with the following code: df =
spark.read.format("parquet").load(f"/mnt/source/(date)")
E. dbutils.widgets.text("date", "null")date = Which code block should be used to create the date Python variable used in the
dbutils.widgets.get("date") above code block?
A. date = spark.conf.get("date")
B. input_dict = input()date= input_dict["date"]
C. import sysdate = sys.argv[1]
D. date = dbutils.notebooks.getParam("date")
E. dbutils.widgets.text("date", "null")date = dbutils.widgets.get("date")
The Databricks workspace administrator has configured interactive clusters for each
of the data engineering groups. To control costs, clusters are set to terminate after
30 minutes of inactivity. Each user should be able to execute workloads against their
assigned clusters at any time of the day.
Assuming users have been added to a workspace but not granted any permissions,
which of the following describes the minimal permissions a user would need to start
D. "Can Restart" privileges on the required and attach to an already configured cluster.
cluster
A. "Can Manage" privileges on the required cluster
B. Workspace Admin privileges, cluster creation allowed, "Can Attach To" privileges
on the required cluster
C. Cluster creation allowed, "Can Attach To" privileges on the required cluster
D. "Can Restart" privileges on the required cluster
E. Cluster creation allowed, "Can Restart" privileges on the required cluster
https://quizlet.com/867626210/certified-data-engineer-professional-flash-cards/ 1/20
,10/14/25, 3:32 PM Certified Data Engineer Terms & Definitions Study Set Flashcards | Quizlet
When scheduling Structured Streaming jobs for production, which configuration
automatically recovers from query failures and keeps costs low?
D.
A. Cluster: New Job Cluster;Retries: Unlimited;Maximum Concurrent Runs: Unlimited
Cluster: New Job Cluster;
B. Cluster: New Job Cluster;Retries: None;Maximum Concurrent Runs: 1
Retries: Unlimited;
C. Cluster: Existing All-Purpose Cluster;Retries: Unlimited;Maximum Concurrent Runs:
Maximum Concurrent Runs: 1
1
D. Cluster: New Job Cluster;Retries: Unlimited;Maximum Concurrent Runs: 1
E. Cluster: Existing All-Purpose Cluster;Retries: None;Maximum Concurrent Runs: 1
The recent_sensor_recordings table contains an
identifying sensor_id alongside the timestamp and
temperature for the most recent 5 minutes of
recordings.
The query is set to refresh each minute and always
completes in less than 10 seconds. The alert is set
to trigger when mean (temperature) > 120.
Notifications are triggered to be sent at most
every 1 minute.
If this alert raises notifications for 3 consecutive
E. The average temperature recordings for minutes and then stops, which statement must be
at least one sensor exceeded 120 on three true?
consecutive executions of the query A. The total average temperature across all sensors
exceeded 120 on three consecutive executions of
the query
B. The recent_sensor_recordings table was
unresponsive for three consecutive runs of the
query
C. The source query failed to update properly for
three consecutive minutes and then restarted
D. The maximum temperature recording for at least
one sensor exceeded 120 on three consecutive
executions of the query
A junior developer complains that the code in their
notebook isn't producing the correct results in the
development environment. A shared screenshot
reveals that while they're using a notebook
B. Use Repos to pull changes from the versioned with Databricks Repos, they're using a
remote Git repository and select the dev- personal branch that contains old logic. The
2.3.9 branch. desired branch named dev-2.3.9 is not available
from the branch selection dropdown.
Which approach will allow this developer to review
the current logic for this notebook?
https://quizlet.com/867626210/certified-data-engineer-professional-flash-cards/ 2/20
, 10/14/25, 3:32 PM Certified Data Engineer Terms & Definitions Study Set Flashcards | Quizlet
Which statement describes what will happen when
the above code is executed?
A. The connection to the external table will fail; the
string "REDACTED" will be printed.
B. An interactive input box will appear in the
notebook; if the right password is provided, the
connection will succeed and the encoded
E. The connection to the external table will
password will be saved to DBFS.
succeed; the string "REDACTED" will be
C. An interactive input box will appear in the
printed.
notebook; if the right password is provided, the
connection will succeed and the password will be
printed in plain text.
D. The connection to the external table will
succeed; the string value of password will be
printed in plain text.
E. The connection to the external table will
succeed; the string "REDACTED" will be printed.
The data science team would like predictions
saved to a Delta Lake table with the ability to
compare all predictions across time. Churn
predictions will be made at most once per day.
Which code block accomplishes this task while
minimizing potential compute costs?
A.
preds.write.mode("append").saveAsTable("churn_pr
eds")
B.
preds.write.format("delta").save("/preds/churn_pred
A. s")
preds.write.mode("append").saveAsTable("c C. (preds.writeStream
hurn_preds") .outputMode("overwrite")
.option("checkpointPath",
"/_checkpoints/churn_preds")
.start("/preds/churn_preds")
D. (preds.write
.format("delta")
.mode("overwrite")
.saveAsTable("churn_preds")
E. (preds.writeStream
.outputMode("append")
.option("checkpointPath",
"/_checkpoints/churn_preds")
.table("churn_preds")
https://quizlet.com/867626210/certified-data-engineer-professional-flash-cards/ 3/20