Exam (elaborations)

C170 Data Management Applications Notes 2022 with complete solution

Rating

Sold

Pages

Grade

A+

Uploaded on

17-04-2022

Written in

2021/2022

Lesson 1 - Conceptual Models A Logical View of Data - Placing the DBMS between the application and the database eliminates most of the file system’s inherent limitations. - The relational model enables you to view data logically rather than physically. Tables and Their Characteristics - The logical view of the relational database is facilitated by the creation of data relationships based on a logical construct known as a relation. - A table is perceived as a two-dimensional structure composed of rows and columns, and is called a relation. - As far as the table’s user is concerned, a table contains a group of related entity occurrences, or an entity set. - The characteristics of a relational table are: 1. A table is perceived as a two dimensional structure composed of rows and columns. 2. Each table row (tuple) represents a single entity occurrence within the entity set. 3. Each table column represents an attribute, and each column has a distinct name. 4. Each intersection of a row and column represents a single data value. 5. All values in a column must conform to the same data format. 6. Each column has a specific range of values known as the attribute domain. 7. The order of the rows and columns is immaterial to the DBMS. 8. Each table must have an attribute or combination of attributes that uniquely identifies each row. Keys - A key consists of one or more attributes that determine other attributes. - Determination is that state in which knowing the value of one attribute makes it possible to determine the value of another. - Functional dependence means that the value of one or more attributes determines the value of one or more other attributes. - The attribute whose value determines another is called the determinant. - The attribute whose value is determined by the other attribute is called the dependent. - Full functional dependence is used to refer to functional dependencies in which the entire collection of attributes in the determinant is necessary for the relationship. Types of Keys - A composite key is a key composed of more than one attribute. - An attribute that is part of a key is called a key attribute. - A superkey is a key that can uniquely identify any row in the table — in other words, it functionally determines every attribute in the row. - A candidate key is a minimal superkey without any unnecessary attributes, based on a full functional dependency. - Entity integrity is the condition in which each row in the table has its own unique identity. - All of the values in the primary key must be unique - No key attribute in the primary key can contain a null. - A foreign key (FK) is the primary key of one table that has been placed into another table to create a common attribute. - Referential integrity is the condition in which every reference to an entity instance by another entity instance is valid. - A secondary key is defined as a key that is used strictly for data retrieval purposes. Relational Algebra - Relational algebra defines the theoretical way of manipulating table contents using relational operators. Formal Definitions and Terminology - A relvar is a variable that holds a relation. The table structure is properly called a relvar, and the data in the structure would be a relation. The relvar is a container (variable) for holding relation data, not the relation itself. - A relvar has two parts: - the heading contains the names of the attributes - the body contains the relation Relational Set Operators - The relational operators have the property of closure; the use of relational algebra operators on existing relations produces new relations. - SELECT (Restrict): - SELECT, also known as RESTRICT, is referred to as a unary operator because it only uses one table as input. - - PROJECT - PROJECT yields all values for selected attributes. It is also a unary operator, accepting only one table as input. - - UNION - UNION combines all rows from two tables, excluding duplicate rows. To be used in the UNION, the tables must have the same attribute characteristics. The columns and domains must be compatible, when they are said to be union-compatible. - - INTERSECT - INTERSECT yields only the rows that appear in both tables. - The tables must be union-compatible to yield valid results. - - DIFFERENCE - DIFFERENCE yields all rows in one table that are not found in the other table; that is, it subtracts one table from the other. - The tables must be union-compatible to yield valid results. - - PRODUCT - PRODUCT yields all possible pairs of rows from two tables. - - JOIN - JOIN allows information to be intelligently combined from two or more tables. JOIN is the real power behind the relational database, allowing the use of independent tables linked by common attributes. - The CUSTOMER and AGENT tables will be used to illustrate several types of joins., - - A natural join links tables by selecting only the rows with common values in their common attributes - First, a PRODUCT of the tables is created. - Second, a SELECT is performed to yield only the rows for which the AGENT_CODE values are equal - - A PROJECT is performed to yield a single copy of each attribute, eliminating duplicate columns. - - The final outcome of a natural join yields a table that does not include unmatched pairs and provides only the copies of the matches. - If no match is made between the table rows, the new table does not include the unmatched row. - The column on which the join was made, occurs only once in the new table. - If the same AGENT_CODE were to occur several times in the AGENT table, a customer would be listed for each match. - An equijoin links tables on the basis of an equality condition that compares specified columns of each table. The outcome of the equijoin does not eliminate duplicate columns, and the condition used to join the tables must be explicitly defined. - The equijoin takes its name from the equality comparison operator used in the condition. If any other comparison operator is used, the join is called a theta join. - Each of the following is an inner join, which only returns matched records from the tables that are being joined. - In an outer join, the matched pairs would be retained, and any unmatched values in the other table would be left null. - A left outer join yields all of the rows in the CUSTOMER table, including those that do not have a matching value in the AGENT table. - - A right outer join yields all of the rows in the AGENT table, including those that do not have matching values in the CUSTOMER table. - - Outer joins are especially useful when you are trying to determine what values in related tables cause referential integrity problems. - DIVIDE - The DIVIDE operator is used to answer questions about one set of data being associated with all values of data in another set of data. - The DIVIDE operation uses one 2-column table as the dividend and one single-column table as the divisor. - The output of the DIVIDE operation on the right is a single column that contains all values from the second column of the dividend that are associated with every row in the divisor. - The Data Dictionary and the System Catalog - The data dictionary provides a detailed description of all tables in the database created by the user and designer. - It contains at least all of the attribute names and characteristics for each table in the system, a.k.a metadata. - The system catalog can be described as a detailed system data dictionary that describes all objects within the database, including data about table names, table’s creator and creation date, etc. - Homonyms indicate the use of the same name to label different attributes. - A synonym is the opposite of a homonym, and indicates the use of different names to describe the same attribute. Relationships within the Relational Database M:N Relationships - The problems inherent in many-to-many relationship can easily be avoided by creating a composite entity / bridge entity / associative entity. - A linking table is the implementation of a composite entity. Indexes - Index is an orderly arrangement used to logically access rows in a table. - The index key is the index’s reference point. - An index is an ordered arrangement of keys and pointers. - Each key points to the location of the data identified by the key. - - In a unique index, the index key can have only one pointer value (row) associated with it. Codd’s Relational Database Rules Rule Rule Name Description 1 Information All information in a relational database must be logically represented as column values in rows within tables. 2 Guaranteed access Every value in a table is guaranteed to be accessible through a combination of table name, primary key value, and column name. 3 Systematic treatment of nulls Nulls must be represented and treated in a systematic way, independent of data type. 4 Dynamic online catalog based on the relational model The metadata must be stored and managed as ordinary data — that is, in tables within the database; such data must be available to authorized users using the standard database relational language. 5 Comprehensive data sublanguage The relational database may support many languages; however it must support one well-defined, declarative language as well as data definition, view definition, data manipulation (interactive and by program), integrity constraints, authorization, and transaction management (begin, commit, and rollback) 6 View updating Any view that is theoretically updatable must be updatable through the system 7 High-level insert, update, and delete The database must support set-level inserts, updates, and deletes 8 Physical data independence Application programs and ad hoc facilities are logically unaffected when physical access methods or storage structures are changed. 9 Logical data independence Application programs and ad hoc facilities are logically unaffected when changes are made to the table structures that preserve the original table values (changing order of columns or inserting columns) 10 Integrity independence All relational integrity constraints must be definable in the relational language and stored in the system catalog, not at the application level 11 Distribution The end users and application programs are unaware of and independence unaffected by data location 12 Nonsubversion If the system supports low-level access to the data, users must not be allowed to bypass the integrity rules of the database 13 Rule zero All preceding rules are based on the notion that to be considered relational, a database must use its relational facilities exclusively for management Lesson 3 - Basic SQL The Database Schema - A schema is a logical group of database objects — such as tables and indexes — that are related to each other. - A single database can hold multiple schemas that belong to different users or applications. - Schemas are useful in that they group tables by owner and enforce a first level of security by allowing each user to see only the tables that belong to that user. - ANSI SQL standards define a command to create a database schema: - CREATE SCHEMA AUTHORIZATION {creator}; Some Common SQL Types Data Type Format Comments Numeric NUMBER(L, D) or NUMERIC(L, D) NUMBER(7,2) or NUMERIC(7,2) indicates numbers will be stored with two decimal places and may be up to seven digits long. INTEGER May be abbreviated as INT. SMALLINT Like INT but limited up to six digits. DECIMAL(L, D) Like the NUMBER specification, but the storage length is a minimum specification. Character CHAR(L) Fixed-length character data for up to 255 characters. VARCHAR(L) or VARCHAR2(L) Variable length character data. Unlike CHAR, VARCHAR will not leave unused spaces. Date DATE Stores data in the Julian date format. Creating Table Structures - CREATE TABLE - CREATE TABLE tablename ( column1 column2 data type data type [constraint] [, [constraint] ] [, PRIMARY KEY (column1 [,column2]) ] [, FOREIGN KEY (column1 [,column2]) REFERENCES tablename] [, CONSTRAINT constraint ] ); - example: - CREATE TABLE VENDOR ( V_CODE INTEGER NOT NULL UNIQUE, V_NAME VARCHAR(35) NOT NULL, V_CONTACT VARCHAR(25) NOT NULL, V_AREA CHAR(3) NOT NULL, V_PHONE CHAR(8) NOT NULL, V_STATE CHAR(2) NOT NULL, V_ORDER CHAR(1) NOT NULL, PRIMARY KEY (V_CODE)); - CREATE TABLE PRODUCT ( P_CODE VARCHAR(10) NOT NULL UNIQUE, P_DESCRIPT VARCHAR(35) NOT NULL, P_INDATE DATE NOT NULL, P_QOH SMALLINT NOT NULL, P_MIN SMALLINT NOT NULL, P_PRICE NUMBER(8,2) NOT NULL, P_DISCOUNT NUMBER(5,2) NOT NULL, V_CODE INTEGER, PRIMARY KEY (P_CODE), FOREIGN KEY (V_CODE) REFERENCES VENDOR ON UPDATE CASCADE); SQL Constraints - The NOT NULL constraint ensures a column does not accept nulls - The UNIQUE constraint ensures all values in a column are unique - The DEFAULT constraint assigns a value to an attribute when a new row is added to a table. - The CHECK constraint is used to validate data when an attribute value is entered. SQL Indexes - Using the CREATE INDEX command, SQL indexes can be created on the basis of any selected attribute. - CREATE [UNIQUE] INDEX indexname ON tablename(column1 [, column2]) - example: - CREATE INDEX P_INDATEX ON PRODUCT(P_INDATE); - Using the UNIQUE index qualifier, you can create an index that prevents you from using a value that has been used before. - CREATE UNIQUE INDEX P_CODEX ON PRODUCT(P_CODE); - A common practice is to create an index on any field that is used as a search key, in comparison operations in a conditional expression, or when you want to list rows in a specific order. - Unique composite indexes are often used to prevent data duplication. - By default all indexes produce results that are listed in ascending order, but you can yield output in descending order - CREATE INDEX PROD_PRICEX ON PRODUCT(P_PRICE DESC); - To delete an index, use the DROP INDEX command: - DROP INDEX indexname - example: - DROP INDEX PROD_PRICEX; Data Manipulation Commands - INSERT - Enter data into a table. - INSERT INTO tablename VALUES (value1, value2, , valueN) - example: - INSERT INTO VENDOR VALUES (21225, ‘Bryson, Inc.’, ‘Smithson’, ‘615’, ‘223-3234’, ‘TN’, ‘Y’); - Inserting rows with optional attributes: - INSERT INTO PRODUCT(P_CODE, P_DESCRIPT) VALUES (‘W’, ‘abc’); Saving Table Changes - Any changes made to the table contents are not saved on disk until you close the database, close the program you are using, or use the COMMIT command. - COMMIT [WORK] Updating Table Rows - Use the UPDATE command to modify data in a table. - UPDATE tablename SET columnname = expression [, columnname = expression ] [WHERE conditionlist ]; - example: - UPDATE PRODUCT SET P_INDATE = ‘18-JAN-2016’, P_PRICE = 17.99, P_MIN = 12 WHERE P_CODE = ‘13-Q2/P2’; Restoring Table Contents - If you have not yet used the COMMIT command, you can restore the database to its previous condition with the ROLLBACK command. - ROLLBACK; - COMMIT and ROLLBACK work only with data manipulation commands that add, modify, or delete table rows. All data definition commands are automatically committed to the data dictionary and cannot be rolled back. Deleting Table Rows - Delete a table row using the DELETE statement. - DELETE FROM tablename [WHERE conditionlist ]; Inserting Table Rows with a Select Subquery - You can add multiple rows to a table using another table as the source of the data. - INSERT INTO tablename SELECT columnlist FROM tablename; - In this case, the INSERT statement uses a SELECT subquery. Additional Data Definition Commands - All changes in the table structure are made by using the ALTER TABLE command, followed by a keyword that produces the specific change you want to make. - ALTER TABLE tablename {ADD | MODIFY} ( columnname datatype ); - It can also be used to add table constraints - ALTER TABLE tablename ADD constraint [ ADD constraint ]; - You can also remove a column or table constraint - ALTER TABLE tablename DROP {PRIMARY KEY | COLUMN columname | CONSTRAINT constraintname }; Changing a Column’s Data Type - ALTER TABLE PRODUCT MODIFY (V_CODE CHAR(5)); Changing a Column’s Data characteristics - ALTER TABLE PRODUCT MODIFY (P_PRICE DECIMAL(9,2)); Adding a Column - ALTER TABLE PRODUCT ADD (P_SALESCHODE CHAR(1)); Dropping a Column - ALTER TABLE VENDOR DROP COLUMN V_ORDER; Copying Parts of Tables - SQL allows you to copy the contents of selected table columns so that the need not be re-entered manually into the new created table(s). - If you want to copy some fields from the PRODUCt table to a new table named PART, you create the PART table structure first, as follows: - CREATE TABLE PART ( PART_CODE CHAR(8), PART_DESCRIPT CHAR(35), PART_PRICE DECIMAL(8,2), V_CODE INTEGER, PRIMARY_KEY (PART_CODE)); - Next you need to add the rows to the new PART table, using the PRODUCT table rows and the INSERT command - INSERT INTO target_tablename [(target_columnlist)] SELECT source_columnlist FROM source-tablename; - Note that the target column list is required if the source column list does not match all of the attribute names and characteristics of the target table (including the order of the columns) - INSERT INTO PART (PART_CODE, PART_DESCRIPT, PART_PRICE, V_CODE) SELECT P_CODE, P_DESCRIPT, P_PRICE, V_CODE FROM PRODUCT; - SQL provides another way to rapidly create a new table based on selected columns and rows of an existing table. In this case, the new table will copy the attribute names, data characteristics, and rows of the original table - CREATE TABLE PART AS SELECT P_CODE AS PART_CODE, P_DESCRIPT AS PART_DESCRIPT, P_PRICE AS PART_PRICE, V_CODE FROM PRODUCT; Adding Primary and Foreign Key Designations - When you create a new table based on another table, the new table does not include integrity rules from the old table. To define the primary key for the new PART table, use the following command: - ALTER TABLE PART ADD PRIMARY KEY (PART_CODE); - If the PART table’s foreign key has not yet been designed, it can be designated by: - ALTER TABLE PART ADD FOREIGN KEY (V_CODE) REFERENCES VENDOR; Deleting a Table from the Database - A table can be deleted from the database using the DROP TABLE command. - DROP TABLE PART; - You can drop a table only if it is not the “one” side of any relationship. Otherwise, the RDBMS will generate an error message indicated that a foreign key integrity violation has occurred. Grouping Data - Sometimes you do not want to treat the entire table as a single collection of data for summarizing. - Rows can be grouped into smaller collections quickly and easily using the GROUP BY clause within the SELECT statement. - SELECT columnlist FROM tablelist [WHERE conditionlist] [GROUP BY columnlist] [HAVING conditionlist] [ORDER BY columnlist [ASC | DESC] ]; - The GROUP BY clause is valid only when used in conjunction with one of the SQL aggregate functions, such as COUNT, MIN, MAX, AVG, and SUM. - The HAVING clause operated very much like the WHERE clause, however the HAVING clause is applied to the output of a GROUP BY operation. - For example, suppose you want to generate a listing of the number of products in the inventory supplied by each vendor. However, this time you want to limit the listing to products whose prices average less than $10. The first part of that statement is satisfied with the help of the GROUP BY clause, but the HAVING clause is used for the second part of the SQL command. Joining Database Tables - A join is performed when data is retrieved from more than one table at a time. - To join tables, you simply list the tables in the FROM clause of the SELECT statement. - The DBMS will create the Cartesian product of every table in the FROM clause. - However, to get the correct result — that is, a natural join — you must select only the rows in which the common attribute values match. - Use the WHERE clause to indicate the common attributes used to link the tables. - The join condition is generally composed of an equality comparison between the foreign key and the primary key of related tables. - SELECT P_DESCRIPT, P_PRICE, V_NAME, V_CONTACT, V_AREACODE, V_PHONE FROM PRODUCT, VENDOR WHERE PRODUCT.V_CODE = VENDOR.V_CODE ORDER BY PRODUCT.P_PRICE; - When joining three or more tables, you need to specify a join condition for each pair of tables. The number of join conditions will always be n-1, where n represents the number of tables. - SELECT CUS_LNAME, INVOICE.INV_NUMBER, INV_DATE, P_DESCRIPT FROM CUSTOMER, INVOICE, LINE, PRODUCT WHERE CUSTOMER.CUS_CODE = INVOICE.CUS_CODE AND INVOICE.INV_NUMBER = LINE.INV_NUMBER AND LINE.P_CODE = PRODUCT.P_CODE AND CUSTOMER.CUS_CODE = 10014 ORDER BY INV_NUMBER; Joining Tables with an Alias - SELECT P_DESCRIPT, P_PRICE, V_NAME, V_CONTACT, V_AREACODE, V_PHONE FROM PRODUCT P, VENDOR V WHERE P.V_CODE = V.V_CODE ORDER BY P_PRICE; Recursive Joins - An alias is useful when a table must be joined to itself in a recursive query. - For example, suppose you are working with the EMP table and want to generate a list of all employees with their managers - SELECT E.EMP_NUM, E.EMP_LNAME, E.EMP_MGR, M.EMP_LNAME FROM EMP E, EMP M WHERE E.EMP_MGR = M.EMP_NUM ORDER BY E.EMP_MGR; Lesson 4 - Advanced SQL SQL Join Operators - The relational join operation merges rows from two tables and returns the rows with one of the following conditions: - Have common values in common columns (natural join) - Meet a given join condition (equality or inequality) - Have common values in common columns or have no matching values (outer join) - Joining two tables through their common column is sometimes referred to as an “old-style” join. - SELECT P_CODE, P_DESCRIPT, P_PRICE, V_NAME FROM PRODUCT, VENDOR WHERE PRODUCT.V_CODE = VENDOR.V_CODE; - Generally the join condition will be an equality comparison of the primary key in one table and the related foreign key in the second table - Join operations can be classified as inner joins and outer joins. - The inner join is the traditional join in which only rows that meet a given criterion are selected. - The criterion can be an equality condition (called a natural join, or an equijoin), or an inequality condition (also called a theta join). - An outer join returns not only the matching rows, but the rows with unmatched attribute values for one table or both tables to be joined. - The SQL standard also introduced a special type of join, called a cross join, that returns the same result as the Cartesian product of two sets or tables. Join Classificatio n Join Type SQL Syntax Example Description CROSS Cross join SELECT * FROM T1, T2 Returns the Cartesian product of T1 and T2 (old style) SELECT * FROM T1 CROSS JOIN T2 Returns the Cartesian product of T1 and T2 INNER Old-style Join SELECT * FROM T1, T2 WHERE T1.X = T2.X Returns only the rows that meet the join condition in the WHERE clause (old style); only rows with matching values are selected. Natural join SELECT * FROM T1 NATURAL JOIN T2 Returns only the rows with matching values in the matching columns; the matching columns must have the same names and similar data types Join using SELECT * FROM T1 JOIN T2 USING (C1) Returns only the rows with matching values in the columns indicated in the USING clause. Join on SELECT * FROM T1 JOIN T2 ON T1.C1 = T2.C1 Returns only the rows that meet the join condition indicated in the ON clause. OUTER Left join SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1.C1 = T2.C1 Returns rows with matching values and includes all rows from the left table (T1) with unmatched values Right join SELECT * FROM T1 RIGHT OUTER JOIN T2 ON T1.C1 = T2.C1 Returns rows with matching values and includes all rows from the right table (T2) with unmatched values. Full join SELECT * FROM T1 FULL OUTER JOIN T2 ON T1.C1 = T2.C! Returns rows with matching values and includes all rows from both tables (T1 and T2) with unmatched values. Cross Join - A cross join performs a relational product (also known as the Cartesian product) of two tables. - The cross join syntax is - SELECT columnlist FROM table1 CROSS JOIN table2 - example: - SELECT * FROM INVOICE CROSS JOIN LINE; - Performs a cross join of the INVOICE and LINE tables that generates N * M number of rows. - You can also perform a cross join that yields only specified attributes. For example: - SELECT INVOICE.INV_NUMBER, CUS_CODE, INV_DATE, P_CODE FROM INVOICE CROSS JOIN LINE; - The results from the previous statement are the same as: - SELECT INVOICE.INV_NUMBER, CUS_CODE, INV_DATE, P_CODE FROM INVOICE, LINE; Natural Join - A natural join returns all rows with matching values in the matching columns and eliminates duplicate columns. - The natural join syntax is: - SELECT columnlist FROM table1 NATURAL JOIN table2 - The natural join will perform the following tasks: - Determine the common attributes by looking for attributes with identical names and compatible data types. - Select only the rows with common values in the common attributes. - If there are no common attributes, return the relational product of the two tables. - example: - SELECT CUS_CODE, CUS_LNAME, INV_NUMBER, INV_DATE FROM CUSTOMER NATURAL JOIN INVOICE; - You are not limited to two tables when performing a natural join. - SELECT INV_NUMBER, P_CODE, P_DESCR, LINE_UNITS, LINE_PRICE FROM INVOICE NATURAL JOIN LINE NATURAL JOIN PRODUCT; JOIN USING Clause - A second way to express a join is through the USING keyword. The query returns only the rows with matching values in the column indicated in the USING clause — and that column must exist in both tables. The syntax is: - SELECT columnlist FROM table1 JOIN table2 USING (common-column) - Example: - SELECT INV_NUMBER, P_CODE, P_DESCR, LINE_UNITS, LINE_PRICE FROM INVOICE JOIN LINE USING (INV_NUMBER) JOIN PRODUCT USING (P_CODE) JOIN ON Clause - The previous two join styles use common attribute names in the joining tables. Another way to express a join when the tables have no common attribute names is to use the JOIN ON operand. - The query will return only the rows that meet the join condition, typically an equality comparison. - SELECT columnlist FROM table1 JOIN table2 ON join-condition - example: - SELECT INVOICE.INV_NUMBER, PRODUCT.P_CODE, P_DESCRIPT, LINE_UNITS, LINE_PRICE FROM INVOICE JOIN LINE ON INVOICE.INV_NUMBER = LINE.INV_NUMBER JOIN PRODUCT ON LINE.P_CODE = PRODUCT.P_CODE; - The JOIN ON clause requires a table qualifier for the common attributes. Outer Joins - An outer join returns not only the rows matching the join condition, it returns the rows with unmatched values. - SELECT columnlist FROM table1 {LEFT | RIGHT | FULL} [OUTER] JOIN table2 ON join-condition - example: - SELECT P_CODE, VENDOR.V_CODE, V_NAME FROM VENDOR LEFT JOIN PRODUCT ON VENDOR.V_CODE = PRODUCT.V_CODE; Subqueries and Correlated Queries - It is often necessary to process data based on other processed data. - Suppose you want to generate a list of vendors who do not provide products. You could generate such a list by writing the following query: - SELECT V_CODE, V_NAME FROM VENDOR WHERE V_CODE NOT IN (SELECT V_CODE FROM PRODUCT); - Similarly, to generate a list of products with a price greater than or equal to the average product price, you can write the following query: - SELECT P_CODE, P_PRICE FROM PRODUCT WHERE P_PRICE = (SELECT AVG(P_PRICE) FROM PRODUCT); WHERE Subqueries - The most common type of subquery uses an inner SELECT subquery on the right side of a WHERE comparison expression. - SELECT P_CODE, P_PRICE FROM PRODUCT WHERE P_PRICE = (SELECT AVG(P_PRICE) FROM PRODUCT); IN Subqueries - SELECT DISTINCT CUS_CODE, CUS_LNAME, CUS_FNAME FROM CUSTOMER JOIN INVOICE USING (CUS_CODE) JOIN LINE USING (INV_NUMBER) JOIN PRODUCT USING (P_CODE) WHERE P_CODE IN (SELECT P_CODE FROM PRODUCT WHERE P_DESCRIPT LIKE ‘%hammer%’ OR P_DESCRIPT LIKE ‘%saw%’); HAVING Subqueries - The HAVING clause is used to restrict the output of a GROUP BY query by applying conditional criteria to the grouped rows. - To list all products with a total quantity sold greater than the average quantity sold, you would write the following query: - SELECT P_CODE, SUM(LINE_UNITS) FROM LINE GROUP BY P_CODE HAVING SUM(LINE_UNITS) (SELECT AVG(LINE_UNITS) FROM LINE); Multirow Subquery Operators: ANY and ALL - You must use an IN subquery to compare a value to a list of values. - Suppose you want to know which products cost more than all individual products provided by vendors from Florida: - SELECT P_CODE, P_QOH * P_PRICE FROM PRODUCT WHERE P_QOH * P_PRICE ALL ( SELECT P_QOH * P_PRICE FROM PRODUCT WHERE V_CODE IN ( SELECT V_CODE FROM VENDOR WHERE V_STATE = ‘FL’ ) ); FROM Subqueries - You can use a SELECT subquery in the FROM clause. - For example, assume you want to know all customers who have purchased products P2 and HB. All product purchases are stored in the LINE table, so you can easily find out who purchased any given product by searching the P_CODE attribute in the LINE table. In this case, you want to know all customers who purchased both products, not just one. You could write the following query: - SELECT DISTINCT CUSTOMER.CUS_CODE, CUSTOMER.CUS_LNAME FROM CUSTOMER, (SELECT INVOICE.CUS_CODE FROM INVOICE NATURAL JOIN LINE WHERE P_CODE = ‘P2’) AS CP1, (SELECT INVOICE.CUS_CODE FROM INVOICE NATURAL JOIN LINE WHERE P_CODE = ‘HB’) AS CP2 WHERE CUSTOMER.CUS_CODE = CP1.CUS_CODE AND CP1.CUS_CODE = CP2.CUS_CODE; Attribute List Subqueries - The SELECT statement uses the attribute list to indicate what columns to project in the resulting set. - Those columns can be attributes of base tables, computed attributes, or the result of an aggregate function. The attribute list can also include a subquery expression, also known as an inline subquery. SELECT P_CODE, P_PRICE, (SELECT AVG(P_PRICE) FROM PRODUCT) AS AVGPRICE, P_PRICE - (SELECT AVG(P_PRICE) FROM PRODUCT) AS DIFF FROM PRODUCT; - Note that if you try to use the alias in the difference expression, you will get an error message. - The column alias cannot be used in computations in the attribute list when the alias is defined in the same attribute list. Correlated Subqueries - A correlated subquery is a subquery that executes once for each row in the outer query. - Suppose you want to know all product sales in which the units sold value is greater than the average units sold value for that product. In that case the following procedure must be completed: - 1. Compute the average units sold for a product. - 2. Compare the average computered in Step 1 to the units sold in each sale row, and select only the rows in which the number of units sold is greater. - The following correlated query completes the two-step process: - SELECT INV_NUMBER, P_CODE, LINE_UNITS FROM LINE AS LS WHERE LS.LINE_UNITS ( SELECT AVG(LINE_UNITS) FROM LINE AS LA WHERE LA.P_CODE = LS.P_CODE ); - Correlated subqueries can also be used with the EXISTS special operator. - For example, suppose you want to know the names of all customers who have placed an order lately. In that case, you could use a correlated subquery like the first one. - SELECT CUS_CODE, CUS_LNAME, CUS_FNAME FROM CUSTOMER WHERE EXISTS ( SELECT CUS_CODE FROM INVOICE WHERE INVOICE.CUS_CODE = CUSTOMER.CUS_CODE ); - Another example. - Suppose you want to know what vendors you must contact to order products approaching the minimum quantity-on-hand value. In particular, you want to know the vendor code and vendor name for products with a quantity on hand that is less than double the minimum quantity. The query is as follows: - SELECT V_CODE, V_NAME FROM VENDOR WHERE EXISTS ( SELECT * FROM PRODUCT WHERE P_QOH P_MIN * 2 AND VENDOR.V_CODE = PRODUCT.V_CODE ); SQL Functions - You’ll need to use SQL functions when you want to list all employees ordered by year of birth, or when your marketing department wants you to generate a list of all customers ordered by zip code and the first three digits of their telephone numbers - In both cases, you’ll need to use databases not present as such in the database. Instead, you will need a SQL function that can be derived from an existing attribute. - Functions always use a numerical, data, or string value. Function Example TO_CHAR Returns a character string or a formatted string from a date value. WHERE TO_CHAR(EMP_DOB, ‘YYYY’) = ‘1982’; TO_DATE Returns a date value using a character string and a date format mask. TO_DATE(‘11/25/2004’, ‘MM/DD/YYYY’); SYSDATE TO_DATE(‘25-Dec-2016’, ‘DD-MON-YYYY’ - SYSDATE Returns today’s date. ADD_MONTHS Adds a number of months or years to a date. Syntax: ADD_MONTHS(date_value , n) ADD_MONTHS(P_INDATE, 24) LAST_DAY Returns the date of the last day of the month given in a date. LAST_DAY(EMP_HIRE_DATE); Equivalent functions for MySQL: Function Example DATE_FORMAT Returns a character string or a formatted string from a date value. SELECT DATE_FORMAT(P_INDATE, ‘%M %d %Y’) YEAR Returns a four-digit year WHERE YEAR(EMP_DOB) = 1982; MONTH Returns a two-digit month code WHERE MONTH(EMP_DOB) = 11; DAY Returns the number of the day WHERE DAY(EMP_DOB) = 14; ADDDATE Adds a number of days to a date. ADDATE(P_INDATE, 30) DATE_ADD Adds a number of days, months, or years to a date. Syntax: DATE_ADD(date, INTERVAL n unit) DATE_ADD(P_INDATE, INTERVAL 2 YEAR); LAST_DAY Returns the date of the last day of the month given in a date. LAST_DAT(EMP_HIRE_DATE) Relational Set Operators - SQL data manipulation commands are set-oriented, that is, they operate over entire sets of rows and columns at once. - You can combine two or more sets to create new sets using UNION, INTERSECT, and EXCEPT (MINUS). - These commands work properly only if relations are union-compatible, which means the number of attributes must be the same and their corresponding data types must be alike. UNION - The UNION query is a perfect tool for generating a combined listing of customers — on that excludes duplicate records. - The UNION statement combines rows from two or more queries without including duplicate rows. - query UNION query - In other words, the UNION combines the output of two SELECT queries. - SELECT CUS_LNAME, CUS_FNAME, CUS_PHONE FROM CUSTOMER UNION SELECT CUS_LNAME, CUS_FNAME, CUS_PHONE FROM CUSTOMER_2; UNION ALL - If SaleCo’s management wants to know how many customers are on both the CUSTOMER and Customer_2 lists, a UNION ALL query can be used to produce a relation that retains the duplicate rows. The following query will keep all rows from both queries, including duplicates. - SELECT ... FROM CUSTOMER UNION ALL SELECT ... FROM CUSTOMER_2 INTERSECT - If SaleCo’s management wants to know which customer records are duplicated in the CUSTOMER and CUSTOMER_2 tables, the INTERSECT statement can be used to combine rows from two queries, returning only rows that appear in both sets. - query INTERSECT query - To generate a list of duplicate customer records, you can use the following query: - SELECT CUS_LNAME, CUS_FNAME, CUS_PHONE FROM CUSTOMER INTERSECT SELECt CUS_LNAME, CUS_FNAME, CUS_PHONE FROM CUSTOMER_2; - The INTERSECT statement can be used to generate additional useful information. For example, the following query returns the customer codes for all customers who are in area code 615 and who have made purchases. - SELECT CUS_CODE FROM CUSTOMER WHERE CUS_AREACODE = ‘615’ INTERSECT SELECT DISTINCT CUS_CODE FROM INVOICE; EXCEPT (MINUS) - The EXCEPT statement in SQL combines rows from two queries and returns only the rows that appear in the first set but not in the second. - query EXCEPT query - query MINUS query - If SaleCo managers want to know which customers in the CUSTOMER table are not found in the CUSTOMER_2 table, they can use the following commands: - SELECT ... FROM CUSTOMER MINUS SELECT ... FROM CUSTOMER_2 - Reverse the table designations to find customers in CUSTOMER_2 that are not in CUSTOMER. Syntax Alternatives - If your DBMS does not support the INTERSECT or EXCEPT statements, you can use IN and NOT IN subqueries to obtain similar results. - For example, the following query will produce the same results as the INTERSECT query shown earlier: - SELECT CUS_CODE FROM CUSTOMER WHERE CUS_AREACODE = ‘615’ AND CUS_CODE IN (SELECT DISTINCT CUS_CODE FROM INVOICE); Virtual Tables: Creating a View - A view is a virtual table based on a SELECT query. The query can contain columns, computed columns, aliases, and aggregate functions from one or more tables. - The tables on which the view is based are called base tables. - You can create a view by using the CREATE VIEW command: - CREATE VIEW viewname AS SELECT query - The CREATE VIEW statement is a data definition command that stores the subquery specification in the data dictionary. - A relational view has several special characteristics - You can use the name of a view anywhere a table name is expected in a SQL statement. - Views are dynamically updated each time they are invoked. - Views provide a level of security in the database because they can restrict users to seeing only specific columns and rows in a table. - Views may also be used as the basis for reports. - Example: - CREATE VIEW PROD_STATS AS SELECT V_CODE, SUM(P_QOH * P_PRICE) AS TOTCOST, MAX(P_QOH) AS MAXQTY, MIN(P_QOH) AS MINQTY, AVG(P_QOH) AS AVGQTY FROM PRODUCT GROUP BY V CODE; Updatable Views - A batch update routine pools multiple transactions into a single batch to update a master table field in a single operation. - An updatable view can be used to update attributes in any base tables used in the view. Not all views are updatable. - The most common updatable view restrictions are as follows: - GROUP BY expressions or aggregate functions cannot be used. - You cannot use set operators such as UNION, INTERSECT, and MINUS - Most restrictions are based on the use of JOINs or group operators in views. The base table to be updated must be key-preserved, meaning the values of the primary key of the base table must still be unique by definition in the view. Lesson 5 - Indexes Database Performance-Tuning Concepts - Database performance tuning refers to a set of activities and procedures designed to reduce the response time of the database system — that is, to ensure an end user query is processed by the DBMS in the minimum amount of time. - The performance of a typical DBMS is constrained by three main factors: - CPU processing power - Available RAM - Input/output hard disk and network throughput Performance Tuning: Client and Server - On the client side, the objective is to generate a SQL query that returns the correct answer in the least amount of time, using the minimum amount of resources at the server end. The activities required to achieve that goal are commonly referred to as SQL performance tuning. - On the server side, the DBMS environment must be properly configured to respond to clients’ requests in the fastest way possible, while making optimal use of existing resources. This is referred to as DBMS performance tuning. DBMS Architecture - All data in a database is stored in data files. A data file can contain rows from a single table, or rows from many different tables. - - The data files can automatically expand as required in predefined increments known as extents. - Data files are generally grouped in file groups or table spaces. A table space or file group is a logical grouping of several data files that store data with similar characteristics. - The data cache / buffer cache is a shared memory area that stores the most recently accessed data blocks in RAM. - The SQL cache or procedure cache is a shared reserved memory area that stores the most recently executed SQL statements or procedures, including triggers and functions. - The listener process listens for clients’ requests and handles the processing of the SQL requests to other DBMS processes. Once a request is received, the listener passes the request to the appropriate user process. - The DBMS creates a user process to manage each client session. This process handles all requests you submit to the server. - The scheduler process organizes the concurrent execution of the SQL requests. - The lock manager manages all locks placed on database objects, including disk pages. - The optimizer process analyzes SQL queries and finds the most efficient way to access the data. Database Query Optimization Modes - Most of the algorithms proposed for query optimization are based on two principles: - The selection of the optimum execution order to achieve the fastest execution time - The selection of sites to baccessed to minimize communication costs - Operation modes can be classified as manual or automatic. - Automatic query optimization means the DBMS finds the most cost-effective access path without user intervention. - Manual query optimization requires that the optimization be selected and scheduled by the end user or programmer. - Query optimization algorithms can also be classified according to when the optimization is done. - Static query optimization takes place at compilation time. In other words, the best optimization strategy is selected when the query is compiled by the DBMS. - This approach is common when SQL statements are embedded in procedural programming languages such as C# or .NET. - Dynamic query optimization takes place at execution time. Database access strategy is defined when the program is executed. Therefore, access strategy is dynamically determined by the DBMS at run time, using up-to-date information about the database. - Finally query optimization techniques can be classified according to the type of information that is used to optimize the query. - A statistically based query optimization algorithm uses statistical information about the database. Within statistically based optimizers, some DBMSs allow setting a goal to specify that the optimizer should attempt to minimize the time to retrieve the first row or the last row. - The statistical information is managed by the DBMS and is generated in one of two different modes, dynamic or manual. - In dynamic statistical generation mode, the DBMS automatically evaluates and updates the statistics after each data access operation. - In manual statistical generation mode, the statistics must be updated periodically through a user selected utility such as IBM’s RUNSTAT command. - A rule-based query optimization algorithm is based on a set of user-defined rules to determine the best query access strategy. The rules are entered by the end user or database administrator, and are typically general in nature. Database Statistics - The term database statistics refers to a number of measurements about database objects, number of processors used, processor speed, and temporary space available. Query Processing - the DBMS processes a query in three phases: - Parsing - Execution - Fetching - The processing of SQL DDL statements is different from the processing required by the DML statements. - A DDL statement actually updates the data dictionary tables or system catalog, while a DBML statement mostly manipulates end-user data. - SQL Parsing Phase - The SQL parsing activities are performed by the query optimizer, which analyzes the SQL query and finds the most efficient way to access the data. - Parsing a SQL query requires several steps, in which the SQL query is: - Validated for syntax compliance. - Validated against the data dictionary to ensure table names and column names are correct - Validated against the data dictionary to ensure the user has proper access rights - Analyzed and decomposed into more atomic components - Optimized through transformation into a fully equivalent but more efficient SQL query - Prepared for execution by determining the most efficient execution or access plan. - An access plan is the result of parsing a SQL statement; it contains the series of steps a DBMS will use to execute the query and return the result set in the most efficient way. First, the DBMS checks to see if an access plan already exists for the query in the SQL cache, if it does the DBMS reuses the access plan to save time. If not, the optimizer evaluates various plans and decides which indexes to use and how to best perform join operations. Query Processing Bottleneck - A query processing bottleneck is a delay introduced in the processing of an I/O operation that causes the overall system to slow down. Indexes and Query Optimization - Indexes are crucial in speeding up data access because they facilitate searching, sorting, and using aggregate functions and even join operations. - The improvement in data access speed occurs because an index is an ordered set of values that contains the index key and pointers. The pointers are the row IDs for the actual table rows. - An index scan is more efficient than a full table scan because the index data is preordered and the amount of data is usually much smaller. Therefore, when performing searches, it is almost always better for the DBMS to use the index to access a table than to scan all rows in a table sequentially. - For example, suppose you submit the following query: SELECT CUS_NAME, CUS_STATE FROM CUSTOMER WHERE CUS_STATE = ‘FL’; - If there is no index, the DBMS will perform a full-table scan and read all 14,000 customer rows. Assuming the index STATE_NDX is created and analyzed, the DBMS will automatically use the index to locate the first customer with a state equal to ‘FL’ and then proceed to read all subsequent CUSTOMER rows, using the row IDs in the index as a guide. - If indexes are so important, why not index every column in every table? This is because indexing every column overtaxes the DBMS in terms of index maintenance processing. - One measure that determines the need for an index is the data sparsity. - Data sparsity refers to the number of different values a column could have. For example, a STU_SEX column in a STUDENT table can only have two possible values, M or F; therefore that column is sad to have low sparsity. - In contrast the STU_DOB column that stores the student date of birth can have many different date values; therefore the column is sad to have high sparsity. - When you perform a search in a column with low sparsity, you are likely to read a high percentage of the table rows anyways, therefore index processing might be unnecessary work. - Most DBMSs implement indexes using one of the following data structures: - Hash index based on on an ordered list of hash values. - B-tree index with an index tree stored separately. - Bitmap index uses a bit array to represent the existence of a value or condition. Mostly used in data warehouse applications in tables with a large number of rows in which a small number of column values repeat many times.

Show more Read less

Institution

Course

Content preview

C170 Data Management
Applications
Downloaded by:Evatee
lOMoARcPSD|5967629

C170 Data Management Applications

Data Management Applications (Western Governors
University)

,C170 Data Management
Applications
Downloaded by:Evatee

, lOMoARcPSD|5967629

C170 Data Management
Applications Notes

, lOMoARcPSD|5967629

Lesson 1 - Conceptual Models

A Logical View of Data
- Placing the DBMS between the application and the database eliminates
most of the file system’s inherent limitations.
- The relational model enables you to view data logically rather than physically.

Tables and Their Characteristics
- The logical view of the relational database is facilitated by the
creation of data relationships based on a logical construct known as
a relation.
- A table is perceived as a two-dimensional structure composed of rows and
columns, and is called a relation.
- As far as the table’s user is concerned, a table contains a group of
related entity occurrences, or an entity set.
- The characteristics of a relational table are:
1. A table is perceived as a two dimensional structure composed of
rows and columns.
2. Each table row (tuple) represents a single entity occurrence within the entity set.
3. Each table column represents an attribute, and each column has a distinct name.
4. Each intersection of a row and column represents a single data value.
5. All values in a column must conform to the same data format.
6. Each column has a specific range of values known as the attribute domain.
7. The order of the rows and columns is immaterial to the DBMS.
8. Each table must have an attribute or combination of attributes
that uniquely identifies each row.

Keys
- A key consists of one or more attributes that determine other attributes.
- Determination is that state in which knowing the value of one attribute makes
it possible to determine the value of another.
- Functional dependence means that the value of one or more attributes
determines the value of one or more other attributes.
- The attribute whose value determines another is called the determinant.
- The attribute whose value is determined by the other attribute is called the dependent.
- Full functional dependence is used to refer to functional dependencies in
which the entire collection of attributes in the determinant is necessary
for the relationship.

Report Copyright Violation

Written for

Course: C170 Data Management Applications Notes

All documents for this subject (3)

Document information

Uploaded on: April 17, 2022
Number of pages: 42
Written in: 2021/2022
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

c170 data management applications notes
c170 data management applications notes 2022 with complete solution

$14.99

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

EvaTee

3.8

(947)

Get to know the seller

EvaTee Phoenix University

View profile

Sold

5201

Member since

4 year

Number of followers

3567

Documents

55573

Last sold

8 hours ago

TIGHT DEADLINE? I CAN HELP

Many students don\'t have the time to work on their academic papers due to balancing with other responsibilities, for example, part-time work. I can relate. kindly don\'t hesitate to contact me, my study guides, notes and exams or test banks, are 100% graded

3.8

947 reviews

451

167

171

110

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller EvaTee. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $14.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 48766 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

C170 Data Management Applications Notes 2022 with complete solution

Content preview

Written for

Document information

Subjects

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?