1. The first task after obtaining data: analyze its quality
2. data will be usable for analyses as is: rarely
3. Most data is : dirty
4. dirty data: quality problems with data, needs to be cleaned before it can be used; for example
duplicate data, outdated data, incomplete data, incorrect/inaccurate data, or inconsistent data
5. duplicate data, outdated data, incomplete data, incorrect/inaccurate data, or inconsistent data: dirty
data
6. reportingerrors, inaccurate predictions/conclusions/decisions, economic losses, and inefficient
buisness processes are caused by data quali- ty:
poor
7. IBM predicts that the yearly cost of poor quality data is $ trillion in the US: 3.1
8. Data cleaning is especially important if your data is from sources-
: multiple
9. What are the most common tools for data cleaning?: excel spreadsheets and SQL
10.T or F: Not all issues that have been identified can be addressed every time: T
11.Goal: get data as as possible: clean
12.T or F: Excel datasheets are categorized into text, date. and number data types:
13.What does this function fix? Trim(): leading and trailing spaces 0-32
14.What does this function fix? clean(): presence of unprintable characters 0-32
15.What does this function fix? substitute () + Char(): nonbreaking space 202
16.What does this function fix? Proper(), Upper (), Lower(): Case inconsistency
17.ASCII Code: code representing english characters as numbers, with each letter assigned a
number from 0-255...ex: 0-32 lead space/non printable character, 202 non-breaking space
18.To remove non breaking space, after doing =CODE() to ID which need it:
=SUBSTITUTE(D6,CHAR(160),CHAR(32))
19.Clean (): eliminates non-printable characters
20.Trim (): eliminates leading and trailing spaces
21.Substitute (): replaces instances of a given character or a text string with a specific
character
22.Proper () -: capitalizes the tex
23.Nested
Functions: =PROPER(TRIM(CLEAN(SUBSTITUTE(C6,CHAR(160),CHAR(32)))))
1/
6
, FBA Analytics Final Exam
24.most common mistake with nested functions: ensure number of parenthe- ses to close all
functions
25.With nested functions the functions is first and then then it goes
: inner; outwards
26.Anytime you have text data this is a good function to run: (=trim(clean(sub-
stitute(column,char(160)," ")))
27.Len (): returns the number of characters in a string
28.Right ( ): extracts text from the right side of string
29.Left ( ): extracts text from the left side of string
30. =CONCATENATE: =CONCATENATE(text 1, text2) - Joins two or more text strings into one
string
31. =IF (): (logical_test,[value_if_true],[value_if_false])
32.VLOOKUP function: A function that returns values from a vertical lookup table.
33.if error (): displays a message or value that you specify, rather than the one
automatically generated by Excel
34. =IFERROR(VLOOKUP(I6,lookuptable,2,FALSE),I6): find and fix abbreviations with abbreviated
inconsistencies and a lookup table
35.is nummber (): returns true if a value is a number or else will return false
36.date (): formats a value asa date in different forms
37.excel only treats dates as a : number
38. =Text (): =TEXT(value, num_format) - Converts a numeric value to text and lets you specify
the display formatting by using special format strings
39.Value (): converts text into numbers
40.Value does not work if there is a space in between text characters, how would you fix it:
value(substitute(I6, " ", "") to delete the space.
41.How to detect duplicate records: conditional formatting>highlight cell
rules>duplicate values to passport nnumber colum
42.How to remove duplicate records: data>remove duplicates and deselect all the columns
that have dif values
43.goal of data analytics: go from data to a decision
44.t or f: data must be timely, relevant, and reliable: true
45.t or f: A database maintains transformations, integrity, and backups: true
46.an organized collection of related information is a: database
47.organized collection: all data is described and associated with other data
48.how should data about a certain entity be stored?: together
2/
6