The study of numerical and graphical ways to describe and display your data is called descriptive statistics. It describes the data and helps us understand the data by summarizing the given sample set or population of data.
Ø Central Tendency of Data
The “Mean” is the average of the data. Average can be identified by summing up all the numbers and then dividing them by the number of observation. (X1 + X2 + X3 +… + Xn / n)
Example: Data – 10,20,30,40,50 and Number of observations = 5,
Mean = [ 10+20+30+40+50 ] / 5= 30
Median is the 50%th percentile of the data. It is exactly the center point of the data. Median can be identified by ordering the data and splits the data into two equal parts and find the number. It is the best way to find the center of the data.
Mode is frequently occurring data or elements. If an element occurs the highest number of times, it is the mode of that data. There can be more than one mode in a dataset if two values have the same frequency and also the highest frequency.
The mode can be calculated for both quantitative and qualitative data.
Ø Dispersion of Data
- Inter Quartile Range (IQR)
Quartiles are special percentiles.
1st Quartile Q1 is the same as the 25th percentile.
2nd Quartile Q2 is the same as 50th percentile.
3rd Quartile Q3 is same as 75th percentile
The range is the difference between the largest and the smallest value in the data.
Range= Max – Min
- Standard Deviation
The most common measure of spread is the standard deviation. The Standard deviation is the measure of how far the data deviates from the mean value.
If x is a number, then the difference “x – mean” is its deviation. The deviations are used to calculate the standard deviation.
Sample Standard Deviation, s = Square root of sample variance [Σ (x − x ¯ )2/ n-1]
Population Standard Deviation, σ = Square root of population variance [ Σ (x − μ)2 / N] where μ is Mean and N is no. of population
The standard deviation for population: The standard deviation is always positive or zero. It will be large when the data values are spread out from the mean.
The variance is a measure of variability. It is the average squared deviation from the mean.
The symbol σ2 represents the population variance and the symbol for s2 represents sample variance.
Population variance σ2 = [ Σ (x − μ)2 / N]
Sample Variance s2 = [ Σ (x − x ¯ )2/ n-1]
Ø Shape of the Data
The shape describes the type of the graph. The shape of the data is important because making a decision about the probability of data is based on its shape.
In the symmetric shape of the graph, the data is distributed the same on both sides. In symmetric data, the mean and median are located close together.
Skewness is the measure of the asymmetry of the distribution of data. The data is not symmetrical (i.e) it is skewed towards one side.
Skewness is classified into two types.
- Positive Skew:
In a Positively skewed distribution, the data values are clustered around the left side of the distribution and the right side is longer.
The mean and median will be greater than the mode in the positive skew.
- Negative Skew
In a Negatively skewed distribution, the data values are clustered around the right side
of the distribution and the left side is longer.
The mean and median will be less than the mode.