QUESTIONS AND ANSWERS GUARANTEE A+
✔✔What are models (databricks) - ✔✔Although they are not, strictly speaking, data
assets, they can also be managed in Unity Catalog and reside at the lowest level in the
object hierarchy.
✔✔What are tables, views, and volumes (databricks) - ✔✔These are at the lowest level
in the data object hierarchy
✔✔What is a schema (databricks) - ✔✔Also known as databases, they are the second
layer of the object hierarchy and contain tables and views.
✔✔What does each metastore expose? - ✔✔a three-level namespace
(catalog.schema.table) that organizes your data.
✔✔What is a catalog - ✔✔the first layer of the object hierarchy, used to organize your
data assets.
It contains schemas (databases)
✔✔What does Databricks make available as part of Databricks Machine Learning to
support machine learning workloads? - ✔✔- Support for distributed model training on
big data
- Built-in real-time model serving
- Built-in automated machine learning development
- Optimized and preconfigured machine learning frameworks
✔✔Why does Databricks make special features available to support machine learning
workloads? - ✔✔Because data organizations need specialized environments designed
specifically for machine learning workloads.
✔✔What technology has Databricks introduced to further speed up and scale all query-
based workloads? - ✔✔Photon
✔✔What technology is Photon built on top of? - ✔✔Apache Spark
✔✔Why did Databricks introduce Photon? - ✔✔It can be challenging for a data
lakehouse to provide both performance and scalability for all of its query-based
workloads to the standards of a data warehouse and a data lake.
, ✔✔What is Databricks Photon - ✔✔Photon is a high-performance Databricks-native
vectorized query engine that runs your SQL workloads and DataFrame API calls faster
to reduce your total cost per workload.
✔✔What are key features and advantages of using Photon. - ✔✔- Support for SQL and
equivalent DataFrame operations with Delta and Parquet tables.
- Accelerated queries that process data faster and include aggregations and joins.
- Faster performance when data is accessed repeatedly from the disk cache.
- Robust scan performance on tables with many columns and many small files.
- Faster Delta and Parquet writing using UPDATE, DELETE, MERGE INTO, INSERT,
and CREATE TABLE AS SELECT, including wide tables that contain thousands of
columns.
- Replaces sort-merge joins with hash-joins.
✔✔What is the benefit to a business if they use Photon? - ✔✔While it is more
expensive, it offers a more performant experience.
Overall, the TCO is worth it for the business as cluster maintenance, optimization
exercises took time and required expensive and specialized talent, while this just works
✔✔What is a consequence of using Unity Catalog to manage, organize and segregate
data objects? - ✔✔Complete data object referencing requires three levels
✔✔In which of the following ways do serverless compute resources differ from classic
compute resources within the Databricks Lakehouse Platform? - ✔✔- They exist within
the Databricks cloud account
- They are always running and reserved for a single, specific customer when needed
✔✔Where do non-serverless compute resources exist? - ✔✔Inside the customers
AWS/Azure/GCP environment
✔✔Which of the Databricks Lakehouse Platform services or capabilities provides a data
warehousing experience to its users? - ✔✔Databricks SQL
✔✔Explain Databricks to a five year old - ✔✔Makes little bits of big computers use data
in lots of ways and in lots of languages.