Probability and statistics
Representation of data
A continuous variable is one for which, within the limits the variable ranges, any value is possible
A discrete variable is one that cannot take on all values within the limits of the variable
Raw data is a term for data collected on source which has not been subjected to processing or
any other manipulation, it is also known as primary data .
44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
A stemplot (or stem-and-leaf plot), in statistics, is a device for presenting quantitative data in a
graphical format, similar to a histogram, to assist in visualizing the shape of a distribution
4|4679
5|
6|34688
7|2256
8|148
9|
10| 6
key: 6|3=63
leaf unit: 1.0
stem unit: 10.0
Box-And-Whisker Plots
A box-and-whisker plot is a way to provide a simple summary of a data set using only five points.
The five-number summary consists of the median, Q1, Q3, the low and high in the distribution.
Immediate visuals of a box-and-whisker plot are the center, the spread, and the overall range of
distribution.
A histogram is a graphical representation, showing a visual impression of the distribution of
data. It is an estimate of the probability distribution of a continuous variable
Area of each bar is proportional to the frequency
frequency
height
width class
height frequency density
, The mean , x , of a data set of n values is given by
x1 x 2 .... xn xi
x
n n
Area=frequency
Easier calculation of mean
x
x-a a
n
Area=frequency
Example : 80.1 80.3 80.7 80.8 =frequency density x width
0.1 0.3 0.7 0.8 1.9
x 80 80 80.47
4 4
width
To find the median of a data set of n values
• Arrange the values in order of increasing size
1
• If n is odd the median is the n 1th value
2
1
• If n is even the median is halfway between the n th value and the following value
2
The mean , x , of a data set in which the variable takes the value x1 with frequency f1 ,
x 2 with frequency f 2 and so on is given by
x1 f 1 x 2 f 2 .... xn f n xi f i
x
f 1 f 2 .... f n fi
Mode value of variable corresponding to highest frequency
- no mode if all values have same frequency
- more than one mode if two or more values have the same maximum frequency
Modal class is the class with the highest frequency density
Representation of data
A continuous variable is one for which, within the limits the variable ranges, any value is possible
A discrete variable is one that cannot take on all values within the limits of the variable
Raw data is a term for data collected on source which has not been subjected to processing or
any other manipulation, it is also known as primary data .
44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106
A stemplot (or stem-and-leaf plot), in statistics, is a device for presenting quantitative data in a
graphical format, similar to a histogram, to assist in visualizing the shape of a distribution
4|4679
5|
6|34688
7|2256
8|148
9|
10| 6
key: 6|3=63
leaf unit: 1.0
stem unit: 10.0
Box-And-Whisker Plots
A box-and-whisker plot is a way to provide a simple summary of a data set using only five points.
The five-number summary consists of the median, Q1, Q3, the low and high in the distribution.
Immediate visuals of a box-and-whisker plot are the center, the spread, and the overall range of
distribution.
A histogram is a graphical representation, showing a visual impression of the distribution of
data. It is an estimate of the probability distribution of a continuous variable
Area of each bar is proportional to the frequency
frequency
height
width class
height frequency density
, The mean , x , of a data set of n values is given by
x1 x 2 .... xn xi
x
n n
Area=frequency
Easier calculation of mean
x
x-a a
n
Area=frequency
Example : 80.1 80.3 80.7 80.8 =frequency density x width
0.1 0.3 0.7 0.8 1.9
x 80 80 80.47
4 4
width
To find the median of a data set of n values
• Arrange the values in order of increasing size
1
• If n is odd the median is the n 1th value
2
1
• If n is even the median is halfway between the n th value and the following value
2
The mean , x , of a data set in which the variable takes the value x1 with frequency f1 ,
x 2 with frequency f 2 and so on is given by
x1 f 1 x 2 f 2 .... xn f n xi f i
x
f 1 f 2 .... f n fi
Mode value of variable corresponding to highest frequency
- no mode if all values have same frequency
- more than one mode if two or more values have the same maximum frequency
Modal class is the class with the highest frequency density