Q: What is the primary difference between Azure Blob Storage and Azure
Data Lake Storage Gen2?
ANSWER ADLS Gen2 adds a hierarchical namespace (directories) to Blob
Storage.
Q: Which storage tier is best for data that is rarely accessed but must be
stored for a minimum of 180 days?
ANSWER The Archive tier.
Q: You need to grant a security principal read access to blobs in a specific
container. Which RBAC role should you assign?
ANSWER Storage Blob Data Reader.
Q: What is the maximum size of a block blob in Azure Storage?
ANSWER Approximately 4.75 TB (or 190,500 TiB for the account,
depending on context, but 4.75 TB is the standard block blob limit).
Q: Which feature allows you to replicate data asynchronously to a
secondary region for disaster recovery?
ANSWER Geo-redundant storage (GRS).
Q: How do you enable hierarchical namespace on an existing storage
account?
ANSWER You cannot; it must be enabled at the time of creation.
Q: What is the purpose of a Shared Access Signature (SAS)?
ANSWER To grant limited access (time, permissions) to storage resources
without sharing the account key.
,Q: Which data format is optimized for read-heavy analytical workloads
and is columnar?
ANSWER Parquet.
Q: Which file format supports schema evolution and ACID transactions?
ANSWER Delta Lake.
Q: In the context of ADLS Gen2, what is the "folder" technically?
ANSWER It is a virtual concept; it is simply a prefix in the blob key path.
Q: Which CLI command uploads a local file to a blob path?
ANSWER az storage blob upload.
Q: You need to enforce encryption at rest for data in Azure Storage. Which
encryption is used by default?
ANSWER Microsoft-managed keys.
Q: What is the purpose of the "Lifecycle Management" policy in Azure
Storage?
ANSWER To automate moving data to cooler tiers (Hot to Cool to
Archive) or deleting data based on age.
Q: Which tool allows you to mount an ADLS Gen2 container using the
Blob Fuse driver?
ANSWER BlobFuse (on Linux/Unix) or a similar mount utility.
Q: You have CSV files with varying schemas. Which feature in Synapse
Serverless SQL pool helps infer the schema automatically?
ANSWER OPENROWSET with WITH clause or Schema inference (CSV
parsing).
Q: What is the delimiter used by default in Hadoop/Spark ecosystem files?
ANSWER Comma (for CSV), but tab is also common. (In context of HDFS
text files, it's often configurable, but CSV usually implies comma).
Q: Which storage endpoint is used for Data Lake Storage Gen2?
ANSWER dfs.core.windows.net (Note: blob.core.windows.net also works
but lacks some hierarchical namespace features).
, Q: You want to query a specific folder partitioned by date (e.g.,
/year=2023/month=01/). What is this called?
ANSWER Partition Pruning.
Q: Which format is compressed by default in Spark and is splittable?
ANSWER Parquet (uses Snappy compression by default).
Q: What is the minimum retention period for the Archive access tier?
ANSWER 180 days.
Q: Which storage class offers the lowest access latency?
ANSWER Premium Blob Storage (or Hot tier in standard accounts).
Q: How can you secure access to the management plane (creating storage
accounts) vs the data plane (reading files)?
ANSWER Management plane uses Azure RBAC; Data plane uses Azure
RBAC (Data roles) or ACLs/Shared Key.
Q: In ADLS Gen2, what are Access Control Lists (ACLs)?
ANSWER POSIX-like permissions that control read/write/execute access
for users and groups.
Q: What is the command to create a directory in ADLS Gen2 using Data
Factory?
ANSWER Copy Activity with a Sink as a file system, but specifically
"Create Folder" is a distinct activity or part of metadata settings.
Q: Which option allows you to limit the ingress or egress of data from a
Storage Account?
ANSWER Service Endpoints or Private Endpoints + Network Rules.
Q: What is the purpose of a "Soft Delete" in Azure Blob Storage?
ANSWER To recover data that was accidentally deleted within a retention
period.
Q: You need to change the tier of a blob from Hot to Cool. Which
operation do you use?
ANSWER Set Blob Tier.
Q: Which SDK is commonly used for Python to interact with ADLS Gen2?
Data Lake Storage Gen2?
ANSWER ADLS Gen2 adds a hierarchical namespace (directories) to Blob
Storage.
Q: Which storage tier is best for data that is rarely accessed but must be
stored for a minimum of 180 days?
ANSWER The Archive tier.
Q: You need to grant a security principal read access to blobs in a specific
container. Which RBAC role should you assign?
ANSWER Storage Blob Data Reader.
Q: What is the maximum size of a block blob in Azure Storage?
ANSWER Approximately 4.75 TB (or 190,500 TiB for the account,
depending on context, but 4.75 TB is the standard block blob limit).
Q: Which feature allows you to replicate data asynchronously to a
secondary region for disaster recovery?
ANSWER Geo-redundant storage (GRS).
Q: How do you enable hierarchical namespace on an existing storage
account?
ANSWER You cannot; it must be enabled at the time of creation.
Q: What is the purpose of a Shared Access Signature (SAS)?
ANSWER To grant limited access (time, permissions) to storage resources
without sharing the account key.
,Q: Which data format is optimized for read-heavy analytical workloads
and is columnar?
ANSWER Parquet.
Q: Which file format supports schema evolution and ACID transactions?
ANSWER Delta Lake.
Q: In the context of ADLS Gen2, what is the "folder" technically?
ANSWER It is a virtual concept; it is simply a prefix in the blob key path.
Q: Which CLI command uploads a local file to a blob path?
ANSWER az storage blob upload.
Q: You need to enforce encryption at rest for data in Azure Storage. Which
encryption is used by default?
ANSWER Microsoft-managed keys.
Q: What is the purpose of the "Lifecycle Management" policy in Azure
Storage?
ANSWER To automate moving data to cooler tiers (Hot to Cool to
Archive) or deleting data based on age.
Q: Which tool allows you to mount an ADLS Gen2 container using the
Blob Fuse driver?
ANSWER BlobFuse (on Linux/Unix) or a similar mount utility.
Q: You have CSV files with varying schemas. Which feature in Synapse
Serverless SQL pool helps infer the schema automatically?
ANSWER OPENROWSET with WITH clause or Schema inference (CSV
parsing).
Q: What is the delimiter used by default in Hadoop/Spark ecosystem files?
ANSWER Comma (for CSV), but tab is also common. (In context of HDFS
text files, it's often configurable, but CSV usually implies comma).
Q: Which storage endpoint is used for Data Lake Storage Gen2?
ANSWER dfs.core.windows.net (Note: blob.core.windows.net also works
but lacks some hierarchical namespace features).
, Q: You want to query a specific folder partitioned by date (e.g.,
/year=2023/month=01/). What is this called?
ANSWER Partition Pruning.
Q: Which format is compressed by default in Spark and is splittable?
ANSWER Parquet (uses Snappy compression by default).
Q: What is the minimum retention period for the Archive access tier?
ANSWER 180 days.
Q: Which storage class offers the lowest access latency?
ANSWER Premium Blob Storage (or Hot tier in standard accounts).
Q: How can you secure access to the management plane (creating storage
accounts) vs the data plane (reading files)?
ANSWER Management plane uses Azure RBAC; Data plane uses Azure
RBAC (Data roles) or ACLs/Shared Key.
Q: In ADLS Gen2, what are Access Control Lists (ACLs)?
ANSWER POSIX-like permissions that control read/write/execute access
for users and groups.
Q: What is the command to create a directory in ADLS Gen2 using Data
Factory?
ANSWER Copy Activity with a Sink as a file system, but specifically
"Create Folder" is a distinct activity or part of metadata settings.
Q: Which option allows you to limit the ingress or egress of data from a
Storage Account?
ANSWER Service Endpoints or Private Endpoints + Network Rules.
Q: What is the purpose of a "Soft Delete" in Azure Blob Storage?
ANSWER To recover data that was accidentally deleted within a retention
period.
Q: You need to change the tier of a blob from Hot to Cool. Which
operation do you use?
ANSWER Set Blob Tier.
Q: Which SDK is commonly used for Python to interact with ADLS Gen2?