Statistics
Statistics is the science of collecting, organizing and interpreting numerical facts which we often call data. Synonyms for data are scores, measurements and observations. The study and collection of data involves classifying data in various heads. The process involves lot of representations of a characteristic by numbers and it is termed as measurement. In other words, Data are measurements of a situation under consideration.
Example 1:
The measurements of heights of all creatures in world are a data. All numerical characteristics are called variables. A large number of observations on a single variable can be summarized in a table of frequencies. Any particular pattern of variation is termed as distribution.
Frequency is the number of times a particular value occurs in a set of data. Usually we would record the frequency of data in a frequency table.
In statistics, mode, median, mean and range are typical values to represent a pool of numerical observations. They are calculated from the pool of observations.
Measures of Central Tendency
Mode
Mode is the most common value among the given observations. For example, a person who sells ice creams might want to know which flavor is the most popular.
Median
Median is the middle value, dividing the number of data into 2 halves. In other words, 50% of the observations are below the median and 50% of the observations are above the median. If the number of observations is odd, then the median is observation. If the number of observations is even, then median is the mean of observations.
Mean
Mean is the average of all the values in the set. Its value is given by . For example, a teacher may want to know the average marks of a test in his class.
Example 2:
Find the mean, mode and median of the following set of points:
15, 14, 10, 8, 12, 8, 16, 13
Solution:
First arrange the point values in an ascending order (or descending order).
8, 8, 10, 12, 13, 14, 15, 16
Mean = (8+8+10+12+13+14+15+16)/8 = 96/8
= 12.
Mode = 8 (since it has maximum frequency)
The number of point values is 8, an even number. Hence the median is the average of the 2 middle values.
Median
Skewness
In a normal distribution, the mean, median, and mode are all the same value. In various other symmetrical distributions it is possible for the mean and median to be the same even though there may be several modes, none of which is at the mean. By contrast, in asymmetrical distributions the mean and median are not the same. Such distributions are said to be skewed, i.e., more than half the cases are either above or below the mean.
Example 3:
The distribution of salaries shown in Figure has a pronounced skew distribution.
Figure: A distribution with a very large positive skew
Table shows the measures of central tendency for these data. The large skew results in very different values for these measures. No single measure of central tendency is sufficient for data such as these. There is no need to summarize a distribution with a single number. When the various measures differ, our opinion is that you should report the mean, median, and either the tri-mean or the mean trimmed 50%. Sometimes it is worth reporting the mode as well. In the media, the median is usually reported to summarize the center of skewed distributions. You will hear about median salaries and median prices of houses sold, etc. This is better than reporting only the mean, but it would be informative to hear more statistics.
Table: Measures of central tendency
Measures of Dispersion:
Dispersion measures the degree of scatteredness of the variable about a central value. The following are the measure of dispersion:
Range
Range is the difference between the maximum and minimum values in the set. It is the simplest measure of variation to find.
RANGE = MAXIMUM VALUE - MINIMUM VALUE
Example 4:
Ten students were given a mathematics test. Time taken by them to complete the test is listed below. Find the range of these times
8 12 7 11 12 9 8 10 8 13 (in min.)
Solution:
It can be seen that maximum time taken by a student to complete the test is 13 min and minimum time taken is 7 min. So,
Range = Max. value - Min. value = 13 - 7 = 6 min
Mean Deviation
It is the arithmetic mean of the absolute deviations of the terms of the distribution from its statistical mean. It is least about median.
Mean Deviation for Ungrouped Data:
Let x1, x2, x3 ....xn are n values of variable X and k be the statistical mean (A.M, median, mode) about which we have to find the mean deviation. The mean deviation about k is given by
Mean Deviation for Grouped data
a) Discrete Frequency Distribution:
where di = xi - k, fi be frequencies and N = total frequency.
The mean of given discrete frequency distribution is given by
The median of given discrete frequency distribution is find out by arranging the observations in ascending order and then calculating cumulative frequency. The observation, whose cumulative frequency is equal to or just greater than N/2, is the required median.
b) Continuous Frequency Distribution:
The mean of a continuous frequency distribution is calculated with the assumption that the frequency in each class is centered at its mid-point.
where di=xi-k, fi be frequencies and N=total frequency
Arithmetic mean
a = assumed mean, h = common factor and N = total frequency.
where,
l = lower limit of median class
f = frequency of the median class
h = width of the median class
c = cumulative frequency of the class just preceding the median class.
Example 5:
50 villages are inspected for calculating total number of towers. List below shows the distribution in number. Find the mean deviation of the distribution.
No. of towers 5, 6, 7, 8, 9, 10
No. of villages 8, 12, 9, 15, 4, 2
Solution:
Mean can be calculated from the formula given above as
Mean distribution is calculated as
Variance
The variance of a variate is the arithmetic mean of the squares of all deviations from the mean.
Standard Deviation
It is the proper measure of dispersion about the mean of a set of observations and is expressed as positive square root of variance.
Example 6:
In the previous example we have calculated the mean deviation of the data.
Using the same data calculate variance and standard deviation.
Solution:
Variance is given as:
Now, Standard Deviation is
Analysis of Frequency Distribution (Measures of Variability)
In order to compare the variability of two series with same mean, which are measured in different units, merely calculating the measures of dispersion are not sufficient, but we require such measures which are independent of the units. The measure of variability which is independent of units is called coefficient of variation (C.V.) and defined as
Example 7:
From the data in the above two examples, find the coefficient of variation (C.V.) of no. of towers.
Solution:
From the formula above C.V. can be calculated as
C.V. = 0.6145 (which is the S.D.)/7.02 (which is the mean) × 100 = 8.7535
Example 8:
Given sets
A = {0,5,10,15,25,30,40,45,50,71,72,73,74,75,76,77,78,100} and
B = {0,22,23,24,25,26,27,28,29,50,55,60,65,70,75,80,85,90,95,100}.
Here simple inspection will indicate very different distribution, however it is found that
Min. = 0, Max. = 100
So range = 100,
Median = 50,
Middle of bottom half of the set (Q1) = 25,
Middle of the upper half of the set (Q3) = 75 is same in both cases.
Comments
write a comment