Statistics

This page provides an introduction to Statistics.

Overview

Statistics involves summarizing and describing the main features of a dataset, as well as drawing conclusions and making decisions based on data.

There are three types of data as below:

Ungrouped Data

A data where each observation is separate and distinct, with no grouping or classification.

For example, marks of $10$ students:

$1, 2, 5, 4, 1, 3, 7, 1, 3, 9$ .

Discrete Frequency Data

A data where observations are grouped into distinct categories, with frequencies representing the number of times each category appears.

For example, student scores in a math test:

$\text{Score}$	$\text{No of students with score}$
$40$	$2$
$45$	$5$
$50$	$8$
$55$	$3$
$60$	$2$
$70$	$1$

Continuous Frequency Data

A data where observations are grouped into continuous intervals or ranges, with frequencies representing the number of observations in each interval.

For example, Heights of people:

$\text{Height Interval}$	$\text{No of people which falls in these height interval }$
$150-155$	$5$
$155-160$	$8$
$160-165$	$12$
$165-170$	$10$
$170-175$	$6$

Measure of Central Tendency

Mean, Median & mode are measures of central tendency. It is single number representing the whole data.

Mean

Mean is an average value.

For Ungrouped data, mean formula is:

\frac{\sum \text{x}_\text{i}}{\text{n}}

Where,

$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation,
$\text{n}$ is number of observations.

For discrete frequency data, mean formula is:

\frac{\sum_{\text{i} = 1}^\text{n} \text{x}_\text{i} * \text{f}_\text{i}} {\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}

Where,

$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation group,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group,
$\text{n}$ is number of observation groups.

For continuous frequency data, mean formula is:

\frac{\sum_{\text{i = 1}}^\text{n} \text{x}_\text{i} * \text{f}_\text{i}}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}

Where,

$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation group,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group,
$\text{n}$ is number of observation groups.

Median

Median is central value.

For ungrouped data, median is calculated as below:

First arrange the given data in ascending order or descending order.

If total number of observations are odd then median is $\large(\frac{\text{n} \ \ + \ \ 1}{2})^{\text{th}}$ term.
If total number of observations are even then median is arithematic mean of $\large(\frac{\text{n}}{2})^{\text{th}}$ and $\large(\frac{\text{n} \ \ + \ \ 2}{2})^{\text{th}}$ terms.

For discrete frequency data, median is calculated as below:

First arrange all observations in increasing order.
Now calculate cummulative frequency ( $\text{C}_\text{f}$ )
Median is that observation ( $\text{x}_\text{i}$ ) whose ( $\text{C}_\text{f}$ ) is equal to or just greater than $\large\frac{\text{Sum of all frequencies}}{2}$

For continuous frequency data, median is calculated as below:

First arrange all observations in increasing order.
Now calculate cummulative frequency ( $\text{C}_\text{f}$ )
Median is that observation ( $\text{x}_\text{i}$ ) whose ( $\text{C}_\text{f}$ ) is equal to or just greater than $\large\frac{\text{Sum of all frequencies}}{2}$

Mode

Mode is most frequent value.

For ungrouped data, mode is:

An observation $\text{x}_\text{i}$ occuring maximum number of times.

For discrete frequency data, mode is:

An observation $\text{x}_\text{i}$ which has highest value of $\text{f}_\text{i}$ .

Where,

$\text{x}_\text{i}$ is value of observation group,
$\text{f}_\text{i}$ is frequency of observation group.

For continuous frequency data, mode formula is:

\text{l} + (\frac{\text{f}_1 - \text{f}_0}{2\text{f}_1 - \text{f}_0 - \text{f}_2}) * \text{h}

Where,

$\text{l}$ is lower limit of model class,
$\text{f}_0$ is frequency of the class above the model class,
$\text{f}_1$ is frequency of the model class,
$\text{f}_2$ is frequency of the class below the class,
$\text{h}$ is width of the class interval.

Model class is the class interval whose frequency is greatest.

info

If model class is the the last class internval, then value of $\text{f}_2$ will be $0$ .

Measure of Dispersion

It tells us if measure of central tendency is reliable or not

There are $4$ measures of dispersion:

Mean deviation about $\alpha$ , where $\alpha$ can be mean, median or mode
Variance( $\sigma^2$ )
Standard Deviation( $\sigma$ )

Range

Range is difference between largest and smallest value in dataset.

For all types of data, range formula is:

\text{largest value} - \text{smallest value}

Mean Deviation

Mean deviation is average distance between each value in a dataset and the mean value. It can be also be calculated around median and mode.

For ungrouped data, mean deviation formula is:

\frac{\sum_{\text{i} = 1}^\text{n} | \text{x}_\text{i} - \overline{\text{x}}|}{\text{n}}

Where,

$\text{x}_\text{i}$ is $\text{i}_{\text{th}}$ observation,
$\overline{\text{x}}$ is mean of all the observations,
$\text{n}$ is total number of observations.

info

By replacing $\overline{\text{x}}$ in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively

For discrete frequency data, mean deviation formula is:

\frac{\sum_{\text{i} = 1}^\text{n} \text{f}_\text{i} * |\text{x}_\text{i} - \overline{\text{x}}|}{\sum_{\text{i} = 1}^\text{n} \text{f}_\text{i}}

Where,

$\text{x}_\text{i}$ is $\text{i}_{\text{th}}$ observation group,
$\overline{\text{x}}$ is mean of all the observations for discreate frequency data
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group.

info

By replacing $\overline{\text{x}}$ in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively.

For continuous frequency data, mean deviation formula is:

\frac{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i} * |\text{x}_\text{i} - \overline{\text{x}}|}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}

Where,

$\text{x}_\text{i}$ is midpoint of $\text{i}_{\text{th}}$ observation class interval,
$\overline{\text{x}}$ is mean of all the observations for continuous frequency data,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation class interval.

info

By replacing $\overline{\text{x}}$ in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively.

Variance

The average of the squared differences between each value in a dataset and the mean value. It is denoted as Variance( $\sigma^2$ ).

For ungrouped data, variance formula is:

\frac{\sum_{\text{i} = 1}^\text{n} (\text{x}_\text{i})^2}{\text{n}} - (\overline{\text{x}})^2

Where,

$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation,
$\overline{\text{x}}$ is mean of all the observations,
$\text{n}$ is total number of observations.

For discrete frequency data, variance formula is:

\frac{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i} * (\text{x}_\text{i} - \overline{\text{x}})^2}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}

Where,

$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation group,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group,
$\text{n}$ is number of observation groups.

For continuous frequency data, variance formula is:

\frac{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i} * (\text{x}_\text{i} - \overline{\text{x}})^2}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}

Where,

$\text{x}_\text{i}$ is midpoint of $\text{i}^{\text{th}}$ observation class interval,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation class interval,
$\text{n}$ is number of observation class interval.

Standard Deviation.

Standard deviation is square root of the variance, representing the spread or dispersion of a dataset. It is represented as ( $\sigma$ ).

Coefficient of Variation

This indicator tells you how much variation you have in your data.

\frac{\sigma}{\overline{\text{x}}} * 100

Higher coefficient of variation mean more variable, and lower coefficient of variation mean more consistent so more reliable.

Important Points

If every observation in a dataset is increased or decreased by the same constant value α, then:

\overline{\text{x}}_{\text{new}} = \overline{\text{x}}_{\text{old}} \pm \alpha

\sigma^2_{\text{new}} = \sigma^2_{\text{old}}

If all observations multiplied by same non-zero number $\alpha$ , then:

\overline{\text{x}}_{\text{new}} = \overline{\text{x}}_{\text{old}} * \alpha

\sigma^2_{\text{new}} = \alpha^2 * \sigma^2_{\text{old}}

Sum of squares of the deviations from the mean is minimum.

\sum_{\text{i = 1}}^{\text{n}} (\text{x}_\text{i} - \overline{\text{x}})^2 \text{is least}

Sum of deviations from the mean is zero.

\sum_{\text{i = 1}}^{\text{n}} (\text{x}_\text{i} - \overline{\text{x}}) = 0

Extreme values do not affect the median as strongly as they affect the mean value. For example for dataset $1, 2, 3, 400, 500$ median will be $3$ , and mean will be $\approx 181$ .
Sum of the absolute differences between each observation and the median is smallest.

\sum_{i=1}^n |x_i - \alpha|, \text{where $\alpha$ is median}.

Maximum value of Variance for given data will be:

\text{Variance} (\sigma^2) \le (\frac{\text{range}}{2})^2

Overview​

Ungrouped Data​

Discrete Frequency Data​

Continuous Frequency Data​

Measure of Central Tendency​

Mean​

For Ungrouped data, mean formula is:​

For discrete frequency data, mean formula is:​

For continuous frequency data, mean formula is:​

Median​

For ungrouped data, median is calculated as below:​

For discrete frequency data, median is calculated as below:​

For continuous frequency data, median is calculated as below:​

Mode​

For ungrouped data, mode is:​

For discrete frequency data, mode is:​

For continuous frequency data, mode formula is:​

Measure of Dispersion​

Range​

For all types of data, range formula is:​

Mean Deviation​

For ungrouped data, mean deviation formula is:​

For discrete frequency data, mean deviation formula is:​

For continuous frequency data, mean deviation formula is:​

Variance​

For ungrouped data, variance formula is:​

For discrete frequency data, variance formula is:​

For continuous frequency data, variance formula is:​

Standard Deviation.​

Coefficient of Variation​

Important Points​

Overview

Ungrouped Data

Discrete Frequency Data

Continuous Frequency Data

Measure of Central Tendency

Mean

For Ungrouped data, mean formula is:

For discrete frequency data, mean formula is:

For continuous frequency data, mean formula is:

Median

For ungrouped data, median is calculated as below:

For discrete frequency data, median is calculated as below:

For continuous frequency data, median is calculated as below:

Mode

For ungrouped data, mode is:

For discrete frequency data, mode is:

For continuous frequency data, mode formula is:

Measure of Dispersion

Range

For all types of data, range formula is:

Mean Deviation

For ungrouped data, mean deviation formula is:

For discrete frequency data, mean deviation formula is:

For continuous frequency data, mean deviation formula is:

Variance

For ungrouped data, variance formula is:

For discrete frequency data, variance formula is:

For continuous frequency data, variance formula is:

Standard Deviation.

Coefficient of Variation

Important Points