Normal distribution of data
When we say there is normal distribution of data, it has:
- symmetry about the center
- 50% of the values greater than the mean and 50% less than the mean.
Normal distribution, also known as the Gaussian distribution, will appear as a bell curve in graphical form.
Many statistical tests assume normal distribution of data. These tests are also called parametric tests. So before performing any parametric test, we should perform some tests to check the distribution of data. If normality assumption is held, we can go for a parametric test. But if normality assumption is violated, we should either go for non-parametric tests or transform data to achieve normality.
However, according to the central limit theorem, the sampling distribution tends to be normal if sample size is large enough (n>30). So we can ignore the distribution of the data and use parametric tests in such cases.
Examine the normal distribution
- a) By statistical tests
There are several statistical tests for normality. The Shapiro-Wilk test and Kolmogorov-Smirnov (K-S) test are the popular ones. The Shapiro–Wilk test is considered a more appropriate method for smaller sample sizes (<50 samples) although it can also handle larger sample size. Kolmogorov–Smirnov test is used for ≥50 samples. For both of these tests, the null hypothesis states that data are taken from a normal distributed population. When p-value is > 0.05, null hypothesis is accepted and data are called as normally distributed. When p-value is < 0.05, normal distribution cannot be assumed.
- b) By graphical method
We can check for normal distribution through different plots such as histogram, density plot, QQ plot and boxplot. For normal data:
- Histogram and density plot should be approximately bell-shaped and symmetric about the mean.
- A QQ plot which is created by plotting observed and expected quantiles against one another should form an approximate straight line.
- A box plot should have a median line approximately at the center of the box and symmetric whiskers.
In the below figures, we can see different types of plots with perfect normal distribution:
A right skewed distribution is the distribution in which the tail is on the right side. It is also called a positively skewed distribution since there will be a long tail in the positive direction on the number line. In the below figures, we can see different types of plots with right (positively) skewed distribution:
A left skewed distribution is the distribution in which the tail is on the left side. It is also called a negatively skewed distribution since there will be a long tail in the negative direction on the number line. In the below figures, we can see different types of plots with left (negatively) skewed distribution:
- Daniel W.W., Biostatistics: A foundation for Analysis in the Health Sciences. 9th edition: A John Wiley & Sons, Inc., Publication; 2009.