Dataset
all the data collected for a particular analysis
Data
The facts & figures collected, analyzed, and summarized for presentation and
interpretation.
Element
The entity on which data is collected.
Variable
a characteristic of interest of an element.
Observation
The variables associated with an individual element.
Categorical Data
Use numeric or ordinal values of measurement of categories.
Quantitative Data
use numeric (quantitative) measures
Cross-sectional Data
data collected at a similar point in time.
Time Series
data collected over several time periods.
Panel Data
combination of cross-sectional and time series data.
Descriptive Statistics
Describes data or variables
Population
Is the set of all data/variables of a statistical analysis.
Sample
is a subset of the population
,Statistical Inference:
Uses data from a sample to make estimates and test hypothesis about the
characteristics of a population.
What does row 1 usually contain in Excel?
Typically contains the variable's names.
What does Column A usually contain in Excel?
contains the elements; and the rest of the worksheet contains the data in the dataset.
How do you calculate the mean in excel?
=Average
How do you calculate the median in excel?
=median
What is data analytics?
the scientific process of transforming data for decision making. There are three broad
areas of data analytics.
Descriptive Analytics
describe what has happened in the past.
Predictive Analytics
uses statistical models from past data to predict the future [forecasting] or access
the impact of one variable on another [inference].
Prescriptive Analytics
uses models seeking to find a best (optimal) solution. Often these are sometype of
optimization model.
The difference between data and big data are
We will use data (not big data)
1. Volume - the number of observations.
2. Velocity - the speed at which data is collected.
3. Variety - the forms of data are of different types.
4. Veracity - the reliability of the data generated.
Data Mining
Focuses on extracting predictive information from big data.
Frequency Distribution
a tabular summary of data showing the number (i.e. frequency) of observations in each
of several non over-lapping categories.
Relative Frequency
, frequency of a class/ n of a class
Percent Frequency
relative frequency x 100
Bar Chart:
a visual display of frequency; relative frequency & percent frequency distributions.
How would you make a frequency table in excel?
a. Select any cell in Column A
b. Click Insert on the Ribbon Tab
c. In Tables, click Recommend Pivot Tables
d. Click OK.
Pie chart
a visual display of frequency; relative frequency & percent frequency distributions.
How would you create a bar chart in excel?
a. Select any cell from A1 to A51.
b. Click the Insert tab from the Ribbon.
c. In Charts, click Recommended Charts
d. Click OK. [bar chart appears in a new worksheet]
What are the basic steps to using excel?
Access the data, Functions & formulas, Apply tools, Editing options
A frequency distribution with quantitative data must define the classes for a frequency
distribution by:
a. determine the number of non over-lapping classes;
b. determine the width of each class;
c. determine the class limits.
Number of Classes
Typically, between 5 and 20. Small datasets have less; larger datasets have more.
Width of the Class
Generally, it should be the same for each class. Approximate class width = (largest data
value - smallest data value)/number of classes.
Class Limits
Each data observation must only belong to one class.
Relative Frequency Distributions
frequency of the class/n
To construct a frequency distribution in excel