Algorithm - correct answer a set of steps a computer uses to solve a problem. Helps us
manipulate data
Ascii - correct answer american standard code for information interchange
-allows us to easily share data
-developed to represent text on all us computers
-ascii text files are the most portable data files available
File extension - correct answer a tag of three or four letters, preceded by a period,
which identifies a data file's format or the application used to create the file.
Text file - correct answer a file that holds text without any formatting and can be opened
in numerous applications
-the most useful bc they can be read, processed, and exported by all computers ad data
analysis programs
-two flavors: fixed-width and delimited
-can be examined using a text editor
Fixed width file - correct answer a type of ascii text file where data is arranged into
columns
- looks clean, like how a data table should look
Delimited file - correct answer - ascii that uses special characters to mark column
breaks
Looks like a mess
-commas, pipes, ~, !, or ^ separate different pieces of data in each row
-easier to import
Delimiters - correct answer ascii characters that separate data entries from one another,
can be read by computer to determine where column breaks go. Most common are
comma and tab
Socrata - correct answer data-hosting web platforms often used by government's open
data sites. Company
Ckan - correct answer data-hosting web portal often used by government's open data
platforms, open-source
Csv file - correct answer abbreviation of "comma-separated values," type of delimited
text file where commas separate values
Text editor program - correct answer programs that allow us to view and edit ascii text
files, showing hidden ascii characters
Ex: notepad++ or textwrangler
, Text qualifiers - correct answer sometimes used in delimited data to show that the info
within the quotation marks should be kept together
Records retention schedule - correct answer details about records kept by a
government agency and rules for how long the agencies need to keep the records
Computer server - correct answer is a computer system that provides and manages
resources such as data storage, file management, email over a network, processes
data
Desktops - correct answer usually don't host database programs
Outliers - correct answer extreme values that don't appear to belong with the rest of the
data. Could be errors
Dirty data - correct answer flawed data, could contain duplicates, inaccuracies,
inaccurate spelling
Data integrity check - correct answer a way to examine data for problems
Big-picture checks to do first - correct answer - what is the file format
- how many rows (command-end)
- are all of the columns present with proper headers (make sure heads are the same as
metadata)
Metadata, data documentation - correct answer set of data that gives information about
other data
Pivot table - correct answer tool used to summarize data according to categories, used
to organize and filter data in various ways before finalizing the analysis
- made by sorting, averaging or summing
- helps rearrange data
- allow you to draw actionable conclusions from data
Excel's text to columns - correct answer parses (splits) data stored in one column into 2
Concatentation - correct answer merging 2+ text values using spreadsheets
Openrefine - correct answer open-source program that helps with data cleaning
Clustering - correct answer a tool in openrefine that uses algorithms to sniff out text
values that might be the same (ex: different spellings of st.louis)
Summary statistics - correct answer provide snapshot of data set; includes counts,
sums, averages; usually mean and median