Statistics
This page provides an introduction to Statistics.
Overview
Statistics involves summarizing and describing the main features of a dataset, as well as drawing conclusions and making decisions based on data.
There are three types of data as below:
Ungrouped Data
A data where each observation is separate and distinct, with no grouping or classification.
For example, marks of $10$ students:
$1, 2, 5, 4, 1, 3, 7, 1, 3, 9$.
Discrete Frequency Data
A data where observations are grouped into distinct categories, with frequencies representing the number of times each category appears.
For example, student scores in a math test:
$\text{Score}$  $\text{No of students with score}$ 

$40$  $2$ 
$45$  $5$ 
$50$  $8$ 
$55$  $3$ 
$60$  $2$ 
$70$  $1$ 
Continuous Frequency Data
A data where observations are grouped into continuous intervals or ranges, with frequencies representing the number of observations in each interval.
For example, Heights of people:
$\text{Height Interval}$  $\text{No of people which falls in these height interval }$ 

$150155$  $5$ 
$155160$  $8$ 
$160165$  $12$ 
$165170$  $10$ 
$170175$  $6$ 
Measure of Central Tendency
Mean, Median & mode are measures of central tendency. It is single number representing the whole data.
Mean
Mean is an average value.
For Ungrouped data, mean formula is:
$\frac{\sum \text{x}_\text{i}}{\text{n}}$Where,
$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation,
$\text{n}$ is number of observations.
For discrete frequency data, mean formula is:
$\frac{\sum_{\text{i} = 1}^\text{n} \text{x}_\text{i} * \text{f}_\text{i}} {\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}$Where,
$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation group,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group,
$\text{n}$ is number of observation groups.
For continuous frequency data, mean formula is:
$\frac{\sum_{\text{i = 1}}^\text{n} \text{x}_\text{i} * \text{f}_\text{i}}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}$Where,
$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation group,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group,
$\text{n}$ is number of observation groups.
Median
Median is central value.
For ungrouped data, median is calculated as below:
First arrange the given data in ascending order or descending order.
 If total number of observations are odd then median is $\large(\frac{\text{n} \ \ + \ \ 1}{2})^{\text{th}}$ term.
 If total number of observations are even then median is arithematic mean of $\large(\frac{\text{n}}{2})^{\text{th}}$ and $\large(\frac{\text{n} \ \ + \ \ 2}{2})^{\text{th}}$ terms.
For discrete frequency data, median is calculated as below:
 First arrange all observations in increasing order.
 Now calculate cummulative frequency ($\text{C}_\text{f}$)
 Median is that observation ($\text{x}_\text{i}$) whose ($\text{C}_\text{f}$) is equal to or just greater than $\large\frac{\text{Sum of all frequencies}}{2}$
For continuous frequency data, median is calculated as below:
 First arrange all observations in increasing order.
 Now calculate cummulative frequency ($\text{C}_\text{f}$)
 Median is that observation ($\text{x}_\text{i}$) whose ($\text{C}_\text{f}$) is equal to or just greater than $\large\frac{\text{Sum of all frequencies}}{2}$
Mode
Mode is most frequent value.
For ungrouped data, mode is:
An observation $\text{x}_\text{i}$ occuring maximum number of times.
For discrete frequency data, mode is:
An observation $\text{x}_\text{i}$ which has highest value of $\text{f}_\text{i}$.
Where,
$\text{x}_\text{i}$ is value of observation group,
$\text{f}_\text{i}$ is frequency of observation group.
For continuous frequency data, mode formula is:
$\text{l} + (\frac{\text{f}_1  \text{f}_0}{2\text{f}_1  \text{f}_0  \text{f}_2}) * \text{h}$Where,
$\text{l}$ is lower limit of model class,
$\text{f}_0$ is frequency of the class above the model class,
$\text{f}_1$ is frequency of the model class,
$\text{f}_2$ is frequency of the class below the class,
$\text{h}$ is width of the class interval.
Model class is the class interval whose frequency is greatest.
If model class is the the last class internval, then value of $\text{f}_2$ will be $0$.
Measure of Dispersion
It tells us if measure of central tendency is reliable or not
There are $4$ measures of dispersion:
 Mean deviation about $\alpha$, where $\alpha$ can be mean, median or mode
 Variance($\sigma^2$)
 Standard Deviation($\sigma$)
Range
Range is difference between largest and smallest value in dataset.
For all types of data, range formula is:
$\text{largest value}  \text{smallest value}$Mean Deviation
Mean deviation is average distance between each value in a dataset and the mean value. It can be also be calculated around median and mode.
For ungrouped data, mean deviation formula is:
$\frac{\sum_{\text{i} = 1}^\text{n}  \text{x}_\text{i}  \overline{\text{x}}}{\text{n}}$Where,
$\text{x}_\text{i}$ is $\text{i}_{\text{th}}$ observation,
$\overline{\text{x}}$ is mean of all the observations,
$\text{n}$ is total number of observations.
By replacing $\overline{\text{x}}$ in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively
For discrete frequency data, mean deviation formula is:
$\frac{\sum_{\text{i} = 1}^\text{n} \text{f}_\text{i} * \text{x}_\text{i}  \overline{\text{x}}}{\sum_{\text{i} = 1}^\text{n} \text{f}_\text{i}}$Where,
$\text{x}_\text{i}$ is $\text{i}_{\text{th}}$ observation group,
$\overline{\text{x}}$ is mean of all the observations for discreate frequency data
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group.
By replacing $\overline{\text{x}}$ in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively.
For continuous frequency data, mean deviation formula is:
$\frac{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i} * \text{x}_\text{i}  \overline{\text{x}}}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}$Where,
$\text{x}_\text{i}$ is midpoint of $\text{i}_{\text{th}}$ observation class interval,
$\overline{\text{x}}$ is mean of all the observations for continuous frequency data,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation class interval.
By replacing $\overline{\text{x}}$ in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively.
Variance
The average of the squared differences between each value in a dataset and the mean value. It is denoted as Variance($\sigma^2$).
For ungrouped data, variance formula is:
$\frac{\sum_{\text{i} = 1}^\text{n} (\text{x}_\text{i})^2}{\text{n}}  (\overline{\text{x}})^2$Where,
$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation,
$\overline{\text{x}}$ is mean of all the observations,
$\text{n}$ is total number of observations.
For discrete frequency data, variance formula is:
$\frac{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i} * (\text{x}_\text{i}  \overline{\text{x}})^2}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}$Where,
$\text{x}_\text{i}$ is $\text{i}^{\text{th}}$ observation group,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation group,
$\text{n}$ is number of observation groups.
For continuous frequency data, variance formula is:
$\frac{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i} * (\text{x}_\text{i}  \overline{\text{x}})^2}{\sum_{\text{i = 1}}^\text{n} \text{f}_\text{i}}$Where,
$\text{x}_\text{i}$ is midpoint of $\text{i}^{\text{th}}$ observation class interval,
$\text{f}_\text{i}$ is frequency of $\text{i}^{\text{th}}$ observation class interval,
$\text{n}$ is number of observation class interval.
Standard Deviation.
Standard deviation is square root of the variance, representing the spread or dispersion of a dataset. It is represented as ($\sigma$).
Coefficient of Variation
This indicator tells you how much variation you have in your data.
$\frac{\sigma}{\overline{\text{x}}} * 100$Higher coefficient of variation mean more variable, and lower coefficient of variation mean more consistent so more reliable.
Important Points
 If every observation in a dataset is increased or decreased by the same constant value α, then:
 If all observations multiplied by same nonzero number $\alpha$, then:
 Sum of squares of the deviations from the mean is minimum.
 Sum of deviations from the mean is zero.

Extreme values do not affect the median as strongly as they affect the mean value. For example for dataset $1, 2, 3, 400, 500$ median will be $3$, and mean will be $\approx 181$.

Sum of the absolute differences between each observation and the median is smallest.
 Maximum value of Variance for given data will be: