JUMP TO TOPIC
Variance|Definition & Meaning
The variance is the standard deviation squared and represents the spread of a given set of data points. Mathematically, it is the average of squared differences of the given data from the mean. Since the formula involves sums of squared differences in the numerator, variance is always positive, unlike standard deviation.
Figure 1: Graphical representation of variance.
The variance is determined as the sum of the squared deviations from the mean of each data point divided by the number of data points. The equation looks like this:
Variance = Σ(X – μ)2 / N
Where X is a single data point, is the data set’s average, and N represents the total number of data points in the set.
Importance of Variance
When analyzing data distribution and forecasting upcoming data points, variance can serve as a useful tool. For instance, a significant variance in a group of stock prices may be a sign of high market volatility, but a little variance may be a sign of stability.
As values that deviate greatly from the mean are likely to be viewed as outliers, variance can also be used to spot outliers or abnormalities in a data set.
Properties of Variance
The characteristics of variance consist of the following:
- Non-Negativity: Variance is never negative, which means it can never be less than zero. Since the sum of squared deviations from the mean, which is always positive or zero, is used to determine variance, this is the case.
- Zero for constant data: If there is no dispersion in the data and all the values in a dataset are the same, the variance will be zero.
- Units: Because variance is expressed as the data’s squared units, it might be challenging to understand. The standard deviation, which is the variance’s square root, is frequently employed to make variance more understandable.
- Sensitivity to extreme values: Variance can be sensitive to extreme values and can occasionally be altered by their presence.
- Linear transformations: When a linear transformation is done to a dataset, the variance is altered by a nominal quantity. The variance will be multiplied by a constant, for instance, if a dataset is multiplied by that constant.
- Additivity: The variance of two independent datasets added together equals the variance of the independent datasets added separately.
- Independence: The total of the variances of the independent random variables in a linear combination is the variance of the linear combination.
- Consistency: The sample variance approaches the population variance as the sample size grows.
Uses of Variance in Different Areas
Variance is useful in many different applications, including:
- Analysis of the distribution of data: Variance, which analyzes the dispersion between the particular values in a dataset, makes it possible to better comprehend the distribution of data.
- Finding outliers: Because values that differ significantly from the mean are likely to be regarded as outliers, variance can be used to find outliers or extreme values in a data set.
- Significance of differences: Variance can be used to examine the significance of differences between two groups or to establish whether a relationship between two variables is true or the result of chance in hypothesis testing.
- Regression analysis: Regression analysis makes forecasts of future values based on the regression equation and makes use of variance to assess the strength of the link between variables. Variance can be used in regression analysis to assess the strength of the link between variables and to forecast future values using the regression equation.
- Portfolio optimization: Variance is applied in portfolio optimization in finance to assess the level of risk in an investment portfolio and to choose a portfolio that is both well-diversified and has a manageable level of risk.
- Quality control: Variance can be utilized in quality control in engineering and manufacturing to assess a process’ consistency and pinpoint opportunities for development.
- Experimental design: Variance can also be used in experimental design in scientific research to ascertain the effects of various treatments or conditions on a response variable and to draw conclusions about population parameters.
- Hypothesis testing: Variance can greatly be useful in hypothesis testing as well as other statistical procedures, in addition to helping us understand how data are distributed. For instance, variance can be used in hypothesis testing to determine the importance of differences between two groups or to establish if a correlation between two variables is causal or coincidental.
Types of Variance
Variance falls into two categories: sample variance and population variance. In contrast, to sample variance, which is an assessment of the variance in a sample of data collected from a population, population variance is a measure of the variance in the entire population of data. Population variance is calculated as follows:
Var(pop) = Σ(X – μ)2 / N
while the formula for sample variance can be written as:
Var(samp) = Σ(X – x̄)2 / (N-1)
Variance and Standard Deviation
The concept of standard deviation, which is the square root of the variance, is similarly related to variance. Given that it is given in the same units as the data points, the standard deviation is a more understandable way to assess spread.
Figure 2: Formulas for standard deviation and variance.
The square root of the variance is used to determine the standard deviation, which is represented as follows:
Standard Deviation = √Variance
The range of values that are most inclined to lie within a particular number of standard deviations from the mean can be determined using standard deviation. For instance, a normal distribution has data that falls roughly 68% of the time within one standard deviation of the mean and 95% of the time within two standard deviations.
Sensitivity of Variance
It is crucial to remember that variance can be susceptible to outliers and extreme values and that their presence can occasionally have an impact on variance. This may result in erroneous interpretations of the data distribution and skewed predictions in statistical models.
In order to address this problem, researchers frequently change the data to lessen the impact of outliers or use alternate measurements of dispersion, for instance, the median or interquartile range.
- As a statistical measure of the variation in numbers within a collection of data, a variance is a crucial tool in many domains for interpreting data distribution and making predictions.
Figure 3: Dispersion of data from the mean value.
- Instead of employing more extensive mathematical techniques like grouping the data set’s numbers into quartiles, statistics use variance to discover how distinct numbers correlate to one another.
- Variance takes into account that all departures from the mean, regardless of their direction, are the same. The squared deviations, however, cannot add up to 0 and show that there is absolutely no variability in the provided data set.
- Finding variance has the drawback of giving combined weight to extreme results or numbers that deviate greatly from the mean. There is a possibility that squaring these numbers will distort the available data set.
- The fact that variation occasionally results from sophisticated calculations is another drawback.
Examples Explaining the Concept of Variance
Consider the following heights: 715, 360, 120, 220, and 175 millimeters. Find the variance.
Mean and Variance are related terms. Finding the mean is the first stage, which is done as follows:
Mean = ( 715 + 360 + 120 + 220 + 175) / 5 = 318
Therefore, the median is 318 mm.
Determining the difference between each term and the mean value will be the next step.
715 – 318 = 397
360 – 318 = 42
120 – 318 = -198
220 – 318 = -98
175 – 318 = -143
Calculate each individual deviation from the mean, square it, and then find the average once more to determine the variance.
The variance in this situation is, therefore:
= (397)2 + (42)2 + (-198)2 + (-98)2 + (-143)2 /5 = 228630/5
The final answer is Variance = 45726.
Calculate the population variance of the following data:
1.5, 2.6, 3.7, 4.8
Let us calculate the mean of the data in the first step:
Mean = (1.5 + 2.6 + 3.7 + 4.8) / 4 = 3.15
Population Variance = Σ(X – μ)2 / N
Now, we will find the difference between each term in the data set and the mean value:
1.5 – 3.15 = -1.65
2.6 – 3.15 = -0.55
3.7 – 3.15 = 0.55
4.8 – 3.15 = 1.65
The final answer will be:
= (-1.65)2 + (-0.55)2 + (0.55)2 + (1.65)2 /4
All the images/mathematical drawings were created using GeoGebra.