STATISTICS
DATA DESCRIPTION
Vuong Ba Thinh
1
Statistics
ACKNOWLEDMENT
This slides are composed using the book:
[1] Allan G. Bluman , Elementary Statistics: A Step by
Step Approach, eighth edition 2012.
2
Statistics
OUTLINE
Introduction
Measures of Central Tendency
Measures of Variation
Measures of Position
Exploratory Data Analysis
Q&A
3
The Mean
The mean is the sum of the values, divided by the total
number of values. The symbol 𝑋 represents the sample mean.
For a population, the Greek letter 𝜇 (mu) is used for the
mean.
6
Statistics
The Mean (1)
Ex1: The data represent the number of days off per year for a
sample of individuals selected from nine different countries.
Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
Ex2: Miles Run per Week
7
Statistics
The Median
The median is the midpoint of the data array. The symbol
The Mode (2)
Ex3: The data show the number of licensed nuclear reactors
in the United States for a recent 15-year period. Find the
mode.
104 104 104 104 104
107 109 109 109 110
109 111 112 111 109
Ex4: Miles Run per Week
10
Statistics
Outliers
An outlier is an extremely high or an extremely low data
value when compared with the rest of the data values.
Ex: Salaries of Personnel: A small company consists of the
owner, the manager, the salesperson, and two technicians, all
of whose annual salaries are listed here. (Assume that this is
the entire population.)
Find the mean, median, and mode.
11
Statistics
16,400
107,000
1. First, assume you work for the school board in Greenwood
and do not wish to raise taxes to increase salaries. Compute the
mean, median, and mode, and decide which one would best
support your position to not raise salaries.
14
Statistics
Applying the Concepts (1)
2. Second, assume you work for the teachers’ union and want a
raise for the teachers. Use the best measure of central tendency
to support your position.
3. Explain how outliers can be used to support one or the other
position.
4. If the salaries represented every teacher in the school
district, would the averages be parameters or statistics?
5. Which measure of central tendency can be misleading when
a data set contains outliers?
6. When you are comparing the measures of central tendency,
does the distribution display any skewness? Explain.
15
Statistics
Measures of Variation
Ex: Comparison of Outdoor Paint
lowercase letter sigma).
The formula
19
Statistics
Population Standard Deviation
The standard deviation is the square root of the variance.
The symbol for the population standard deviation is 𝜎.
The formula
20
Statistics
Sample Variance and Standard Deviation
The formula of Sample Variance
The formula of Sample Standard Deviation
Ex: Find the sample variance and standard deviation for the
amount of European auto sales for a sample of 6 years shown. The
data are in millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
Range Rule of Thumb
A rough estimate of the standard deviation is
𝑠 ≈
𝑟𝑎𝑛𝑔𝑒
4
Ex: data set 5, 8, 8, 9, 10, 12, and 13.
24
Statistics
Chebyshev’s Theorem
The proportion of values from a data set that will fall within k standard
1
, where
𝑘2
deviations of the mean will be at least 1 −
greater than 1 (k is not necessarily an integer).
k is a number
Ex1: The mean price of houses in a certain neighborhood is
$50,000, and the standard deviation is $10,000. Find the price
range for which at least 75% of the houses will sell.