with complete solution
What is done with the control plane?
web application
notebooks
jobs
cluster management
What is done with the data plane?
spinning up clusters
data processing
accessing data sources
SQL: Create table with specific columns
CREATE TABLE students (id INT, name STRING)
SQL: Insert data
INSERT INTO students
VALUES (1, "Omar")
Does ACID guarantee inserting many records in a single transaction?
Yes
SQL: Update record
UPDATE students
SET value = value + 1
WHERE name LIKE "T%" // starts with T
SQL: Delete record
DELETE FROM students
WHERE value > 6
// Boolean is lowercase
// equals is single =
What is upsert?
Allows updates, inserts and other manipulations in a single command
SQL: Merge for CDC
MERGE INTO students b
USING updates u
ON b.id=u.id
WHEN MATCHED AND u.type = "update"
THEN UPDATE SET *
THEN INSERT *
WHEN NOT MATCHED AND
THEN DELETE
Which metastore does databricks use by default?
Hive
SQL: See details about Delta table including number of files
DESCRIBE DETAIL students
, What does OPTIMIZE do?
Replace existing data files by combining files and rewriting results
SQL: OPTIMIZE
OPTIMIZE students
ZORDER BY id
What are the configs for vacuuming?
SET spark.databricks.delta.retentionDurationCheck.enabled = false;
SET spark.databricks.delta.vacuum.logging.enabled = true;
What does ZORDER do?
speeds up data retrieval when filling on provided fields by colocating data
SQL: View history
DESCRIBE HISTORY students
SQL: Time travel
SELECT *
FROM students
VERSION AS OF 3
SQL: Restore table
RESTORE TABLE students
TO VERSION AS OF 8
What is the default time to vacuum files?
7 days
SQL: Vacuum
VACUUM students RETAIN 0 HOURS
What does DRY RUN do with VACUUM?
Displays all files that will be deleted
What does the LOCATION tag do?
If you set it, you can define a location for the data to reside, otherwise it saves in
/user/hive/warehouse/
SQL: How do you specify a command to run in a database
USE dbname;
CREATE OR REPLACE TABLE
SQL: How do you describe the table?
DESCRIBE TABLE EXTENDED
DESCRIBE EXTENDED
SQL: Create a view, temp view and global temp view
CREATE VIEW view AS SELECT * FROM table
CREATE TEMPORARY VIEW
CREATE GLOBAL TEMPORARY VIEW name AS SELECT * FROM table
SELECT * FROM global_temp.name
What's the difference between view, temp view and global temp view?
View - persists, associated to database
Temp view - temporary, not associated to a database, doesn't persist, session scoped,
single notebook