1. Which of the following best describes a data lake?
A. A system used to store structured data only
B. A repository for storing large amounts of unstructured or semi-structured data
C. A tool used for creating data models
D. A method for querying real-time data
Answer: B) A repository for storing large amounts of unstructured or semi-
structured data
Rationale: A data lake stores vast amounts of unstructured and semi-structured
data, allowing for more flexible data processing and analysis.
2. What is the difference between a JOIN and a UNION in SQL?
A. JOIN combines rows from two tables based on a related column; UNION
combines the result sets of two queries
B. JOIN combines columns from two tables; UNION combines rows from two
queries
C. JOIN creates a new table; UNION deletes duplicate rows
,D. JOIN can only be used on primary keys; UNION can only be used on foreign
keys
Answer: A) JOIN combines rows from two tables based on a related column;
UNION combines the result sets of two queries
Rationale: A JOIN is used to combine rows from two or more tables based on a
related column, while UNION combines results from multiple queries into a single
result set.
3. Which of the following statements is true about data redundancy?
A. It improves database performance by storing the same data in multiple places
B. It can lead to data inconsistency and inefficient use of storage
C. It is a desirable characteristic in a normalized database
D. It ensures that all data is unique and well-structured
Answer: B) It can lead to data inconsistency and inefficient use of storage
Rationale: Data redundancy can result in inconsistencies and excessive storage use,
which is why normalization aims to reduce it.
4. Which SQL clause is used to combine data from multiple tables based on a
related column?
, A. WHERE
B. GROUP BY
C. JOIN
D. HAVING
Answer: C) JOIN
Rationale: JOIN is used to combine data from two or more tables based on a
related column.
5. Which of the following is a benefit of denormalization?
A. It increases database complexity
B. It reduces the need for indexing
C. It improves query performance by reducing joins
D. It guarantees data consistency
Answer: C) It improves query performance by reducing joins
Rationale: Denormalization involves combining tables to improve performance by
reducing the need for multiple joins during query execution.
6. What is a database schema?
A. A system used to store structured data only
B. A repository for storing large amounts of unstructured or semi-structured data
C. A tool used for creating data models
D. A method for querying real-time data
Answer: B) A repository for storing large amounts of unstructured or semi-
structured data
Rationale: A data lake stores vast amounts of unstructured and semi-structured
data, allowing for more flexible data processing and analysis.
2. What is the difference between a JOIN and a UNION in SQL?
A. JOIN combines rows from two tables based on a related column; UNION
combines the result sets of two queries
B. JOIN combines columns from two tables; UNION combines rows from two
queries
C. JOIN creates a new table; UNION deletes duplicate rows
,D. JOIN can only be used on primary keys; UNION can only be used on foreign
keys
Answer: A) JOIN combines rows from two tables based on a related column;
UNION combines the result sets of two queries
Rationale: A JOIN is used to combine rows from two or more tables based on a
related column, while UNION combines results from multiple queries into a single
result set.
3. Which of the following statements is true about data redundancy?
A. It improves database performance by storing the same data in multiple places
B. It can lead to data inconsistency and inefficient use of storage
C. It is a desirable characteristic in a normalized database
D. It ensures that all data is unique and well-structured
Answer: B) It can lead to data inconsistency and inefficient use of storage
Rationale: Data redundancy can result in inconsistencies and excessive storage use,
which is why normalization aims to reduce it.
4. Which SQL clause is used to combine data from multiple tables based on a
related column?
, A. WHERE
B. GROUP BY
C. JOIN
D. HAVING
Answer: C) JOIN
Rationale: JOIN is used to combine data from two or more tables based on a
related column.
5. Which of the following is a benefit of denormalization?
A. It increases database complexity
B. It reduces the need for indexing
C. It improves query performance by reducing joins
D. It guarantees data consistency
Answer: C) It improves query performance by reducing joins
Rationale: Denormalization involves combining tables to improve performance by
reducing the need for multiple joins during query execution.
6. What is a database schema?