Study Notes on Statistics for GATE 2018

By Himanshu Verma|Updated : November 6th, 2017

Statistics

Statistics is the science of collecting, organizing and interpreting numerical facts which we often call data. Synonyms for data are scores, measurements and observations. The study and collection of data involves classifying data in various heads. The process involves lot of representations of a characteristic by numbers and it is termed as measurement. In other words, Data are measurements of a situation under consideration.

Example 1:

The measurements of heights of all creatures in world are a data. All numerical characteristics are called variables. A large number of observations on a single variable can be summarized in a table of frequencies. Any particular pattern of variation is termed as distribution.

Frequency is the number of times a particular value occurs in a set of data. Usually we would record the frequency of data in a frequency table.

In statistics, mode, median, mean and range are typical values to represent a pool of numerical observations. They are calculated from the pool of observations.

Measures of Central Tendency

Mode

Mode is the most common value among the given observations. For example, a person who sells ice creams might want to know which flavor is the most popular.

Median

Median is the middle value, dividing the number of data into 2 halves. In other words, 50% of the observations are below the median and 50% of the observations are above the median. If the number of observations is odd, then the median is image001 observation. If the number of observations is even, then median is the mean of image002 observations.

Mean

Mean is the average of all the values in the set. Its value is given by image003. For example, a teacher may want to know the average marks of a test in his class.

Example 2:

Find the mean, mode and median of the following set of points:

15, 14, 10, 8, 12, 8, 16, 13

Solution:

First arrange the point values in an ascending order (or descending order).

8, 8, 10, 12, 13, 14, 15, 16

Mean = (8+8+10+12+13+14+15+16)/8 = 96/8

= 12.

Mode = 8 (since it has maximum frequency)

The number of point values is 8, an even number. Hence the median is the average of the 2 middle values.

image004

Median image005

Skewness

In a normal distribution, the mean, median, and mode are all the same value. In various other symmetrical distributions it is possible for the mean and median to be the same even though there may be several modes, none of which is at the mean. By contrast, in asymmetrical distributions the mean and median are not the same. Such distributions are said to be skewed, i.e., more than half the cases are either above or below the mean.

Example 3:

The distribution of salaries shown in Figure has a pronounced skew distribution.

image006

Figure: A distribution with a very large positive skew

Table shows the measures of central tendency for these data. The large skew results in very different values for these measures. No single measure of central tendency is sufficient for data such as these. There is no need to summarize a distribution with a single number. When the various measures differ, our opinion is that you should report the mean, median, and either the tri-mean or the mean trimmed 50%. Sometimes it is worth reporting the mode as well. In the media, the median is usually reported to summarize the center of skewed distributions. You will hear about median salaries and median prices of houses sold, etc. This is better than reporting only the mean, but it would be informative to hear more statistics.

image007

Table: Measures of central tendency

Measures of Dispersion:

Dispersion measures the degree of scatteredness of the variable about a central value. The following are the measure of dispersion:

Range

Range is the difference between the maximum and minimum values in the set. It is the simplest measure of variation to find.

RANGE = MAXIMUM VALUE - MINIMUM VALUE

Example 4:

Ten students were given a mathematics test. Time taken by them to complete the test is listed below. Find the range of these times

8 12 7 11 12 9 8 10 8 13 (in min.)

Solution:

It can be seen that maximum time taken by a student to complete the test is 13 min and minimum time taken is 7 min. So,

Range = Max. value - Min. value = 13 - 7 = 6 min

Mean Deviation

It is the arithmetic mean of the absolute deviations of the terms of the distribution from its statistical mean. It is least about median.

Mean Deviation for Ungrouped Data:

Let x1, x2, x3 ....xn are n values of variable X and k be the statistical mean (A.M, median, mode) about which we have to find the mean deviation. The mean deviation about k is given by

image008

Mean Deviation for Grouped data

a) Discrete Frequency Distribution:

image009

where di = xi - k, fi be frequencies and N = total frequency.

The mean of given discrete frequency distribution is given by

image010

The median of given discrete frequency distribution is find out by arranging the observations in ascending order and then calculating cumulative frequency. The observation, whose cumulative frequency is equal to or just greater than N/2, is the required median.

b) Continuous Frequency Distribution:

The mean of a continuous frequency distribution is calculated with the assumption that the frequency in each class is centered at its mid-point.

image011 where di=xi-k, fi be frequencies and N=total frequency

Arithmetic mean

image012

a = assumed mean, h = common factor and N = total frequency.

image013

where,

l = lower limit of median class

f = frequency of the median class

h = width of the median class

c = cumulative frequency of the class just preceding the median class.

Example 5:

50 villages are inspected for calculating total number of towers. List below shows the distribution in number. Find the mean deviation of the distribution.

No. of towers 5, 6, 7, 8, 9, 10

No. of villages 8, 12, 9, 15, 4, 2

Solution:

Mean can be calculated from the formula given above as

image014

Mean distribution is calculated as

image015

Variance

The variance of a variate is the arithmetic mean of the squares of all deviations from the mean.

image016

Standard Deviation

It is the proper measure of dispersion about the mean of a set of observations and is expressed as positive square root of variance.

image017

Example 6:

In the previous example we have calculated the mean deviation of the data.

Using the same data calculate variance and standard deviation.

Solution:

Variance is given as:

image018

Now, Standard Deviation is

image019

Analysis of Frequency Distribution (Measures of Variability)

In order to compare the variability of two series with same mean, which are measured in different units, merely calculating the measures of dispersion are not sufficient, but we require such measures which are independent of the units. The measure of variability which is independent of units is called coefficient of variation (C.V.) and defined as

image020

Example 7:

From the data in the above two examples, find the coefficient of variation (C.V.) of no. of towers.

Solution:

From the formula above C.V. can be calculated as

C.V. = 0.6145 (which is the S.D.)/7.02 (which is the mean) × 100 = 8.7535

Example 8:

Given sets

A = {0,5,10,15,25,30,40,45,50,71,72,73,74,75,76,77,78,100} and

B = {0,22,23,24,25,26,27,28,29,50,55,60,65,70,75,80,85,90,95,100}.

Here simple inspection will indicate very different distribution, however it is found that

Min. = 0, Max. = 100

So range = 100,

Median = 50,

Middle of bottom half of the set (Q1) = 25,

Middle of the upper half of the set (Q3) = 75 is same in both cases.

Comments

write a comment

Follow us for latest updates