Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Statistics

Rating
-
Sold
-
Pages
36
Uploaded on
10-09-2024
Written in
2021/2022

I N T R O D U C T I O N A N D D E S C R I P T I V E S TATI S T I C S

Institution
Course

Content preview

lOMoARcPSD|4942262




Bstats Notes


Business Statistics (University of Technology Sydney)




Scan to open on Studocu




Studocu is not sponsored or endorsed by any college or university
Downloaded by Shebnoor Ahmed ()

, lOMoARcPSD|4942262




LECTURE 1: INTRODUCTION AND DESCRIPTIVE
S TAT I S T I C S I

TYPES OF DATA


QUALITATIVE/CATEGORICAL

 Mutually exclusive labels (one label cannot mean two things)
 Not often numbers, if so, numbers have no mathematical meaning
- Nominal: ordering/ranking makes no sense, numerical labels are arbitrary
- Ordinal: ordering/ranking has meaning/can be interpreted, numerical labels respect
the ordering

QUANTITATIVE/NUMERICAL

 Numbers used to record certain events, numbers have mathematical meaning
- Interval: quantity in difference is meaningful, but in ratio is not; zero has no natural
meaning
- Ratio: difference and ratio of two quantities is also meaningful; zero is meaningful

WORKING WITH CATEGORICAL DATA


 Intuitive to tabulate and visualise, technique is frequency distribution
 Frequency counts: total no of occurrences for each category
 Relative frequency: fraction/proportion of the total no of data items belonging to that
category
 Percent frequency: relative frequency x 100 (%)
 Excel function COUNTIF, technique to use is frequency counts
 To visualise: histogram (categories on x-axis, frequency/relative frequency/percent
frequency on y-axis) or pie chart

INTERMEZZO: THE LANGUAGE


 Random variable (r.v.): a variable whose value appears randomly
- usually denoted by capital letters
- Realisations/observations of an r.v. are denoted by lowercase letters
- e.g. N and n denote the size/number of observations - N is referred to population size,
n denotes sample size (no of data points collected in a sample)
 Population: collection of people, objects or items of interest; complete pool of certain
random variable
 Sample: subset of a population; random collection of a certain size from the population
 Probability distribution: general shape of probability for values that a random variable
may assume

DESCRIPTIVE STATISTIC: CENTRAL TENDENCY




Downloaded by Shebnoor Ahmed ()

, lOMoARcPSD|4942262




 Measure of central tendency yields info about the centre of a set of numbers (distribution
of a r.v.’s) – does not focus on the span of the dataset or how far values are from middle
numbers
 gives an idea of what a typical, middle, or average that a r.v. can take
 sometimes called measures of location

THREE MEASURES OF CENTRAL TENDENCY
Mode - most frequently occurring value in a set of data
- In case of a tie for the most frequently occurring value, two modes are listed
and the data is said to be bimodal
- Datasets with two or more modes are referred to as multimodal
- Concept of mode is often used in determining sizes
- Appropriate descriptive summary measure for categorical data

Media - middle value in an ordered array of numbers
n n+1
- A way to locate the median is by finding the th term in the ordered array
2
- Large and small values do not inordinately influence the median – hence the
best measure of location to use in the analysis of variables in which extreme but
acceptable values can occur at just one end of the data
- Not all info from the dataset is used
- Data must be quantitative or be able to be ranked
Mean - Average of a set of numbers
- Sample mean is represented by X
- Population mean is represented by 
- Data should be quantitative as it needs to be summed
- Affected by all values – advantage because it reflects all the data, but
disadvantage because extreme values pull the mean towards extremes



 Can consider population mean or sample mean – if you denote r.v. by X , you have:
- Population mean is denoted by  or E( X) , computed by




- Sample mean is denoted by X , computed by





Outlier: observation of the r.v. of interest whose value is far outside the range of other
realisations – often biases impressions about the distribution of r.v. in the dataset, we
may want to correct for such biases/simply remove such a data point




Downloaded by Shebnoor Ahmed ()

, lOMoARcPSD|4942262




DESCRIPTIVE STATISTIC: VARIABILITY


 Measures of variability yield info about the likelihood of a realisation of the r.v. is away
from the centre of its distribution, describes the spread/dispersion of a dataset
 Gives an idea of fluctuation and volatility across realisations of the r.v.
 The more variability in a dataset, the less typical they are of the whole set
 Using measures of variability in conjunction with measures of central tendency makes
possible a more complete numerical description of the data (measure of variability is
necessary to complement the mean value when describing data)

FIVE MEASURES OF VARIABILITY
Range - Maximum – minimum
- Crude measure of variability
- Advantage: ease of calculation; disadvantage: affected by extreme
values (thus application as a measure of variability is limited)
Inter-quartile - Distance between the first and third quartiles, IQR = Q 3−Q 1
range - Essentially the range of the middle 50% of the data
- useful when there is interest in values towards the middle rather
than values in the extremes
Variance - one is obtained from the other, they are presented together
- Variance and standard deviation measure out how spread out a r.v.
Standard is, the large the more spread out
deviation - involves considering how far each data value is from the mean and
describing this dispersion on average
- subtracting the mean from each value of data yields the deviation
from the mean: x−¿ - negative deviations represent values below
the mean, positive deviations represent values above the mean




VARIANCE
- Average squared distance between data points and their mean
- Sum of squared deviations from the mean of a set of values is called
the sum of squares of x : SS x

STANDARD DEVIATION
- Square root of the variance – has the same unit of the original data
- Estimate of the average distance that individual values are away
from the mean




Coefficient of - Standard deviation ÷ mean
variation




Downloaded by Shebnoor Ahmed ()

Written for

Institution
Course

Document information

Uploaded on
September 10, 2024
Number of pages
36
Written in
2021/2022
Type
Class notes
Professor(s)
Unknown
Contains
Business statistics

Subjects

$10.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
ahmednoor53

Get to know the seller

Seller avatar
ahmednoor53
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
1 year
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions