Have you ever wondered where to start for being a successful data analyst? Here is an article that shows everything about data analysis.
Standard deviation is a statistical term to show us the range between average and the actual data. It allows us to understand the variability and consistency of the data. The higher standard deviation means larger dispensation while lower standard deviation means the data is consistent.
When working with a numeric data set, you usually want to know is what the standard element of the set is. You can try finding the mean (the sum of all the individuals divided by the number of individuals) or the median (the individual that stands in the middle of sorted from low to high) but the middle of the set does not give reliable information about the set.
The standard deviation shows how consistent are the individuals are in this data set. Therefore it is commonly used and very important.
How To Calculate Standard Deviation Mathematically
Let us show you how to calculate the Standard date by giving a very simple example. Let’s say you have a set of numeric data: 4, 6, 8, 12, 15.
First, you should find the mean by you add all the numbers and divide them to the number of individuals in the data set. In this case, it is 45/9. So the mean is 5.
Then, subtract all the numbers from the mean and square them so all the results will be positive instead of negative. It will be added up all these squared numbers to each other and divide them to one less of the number of individuals in our data set (which is 4 in this example).
Just take the square root of the equation and there you are! You calculated the standard deviation. It is 4, 47.
Why Standard Deviation Is an Important Statistic?
The standard deviation has vital importance in matters like finances. You can’t calculate the risk by considering the only average. It might give you a superficial opinion of the data but it is not healthy or informed enough.
Let’s say you are about to invest some money on a company and they tell you investors gain 30% back at the end of the year. What this data will tell you is that the gain can differ from -20% to 80%.
You applied to a job and you are told the average salary of a company’s personnel is 20.000 dollars. This might seem amazing but what if the salaries variated from 1.000 dollars to 19.000 dollars?
Or when a teacher says the average score in the class in 80 out of 100. This means the scores can be 70 and 10 or 80 and 80. This is a very wide range. The standard deviation shows us if data in a data set are consistent therefore reliable. A low standard deviation is more appealing than the high average.
It is also not possible to compare data sets accurately without using standard deviations. Let’s say you have two data sets that have the same mean. One of them is 600,30,0 and 299,210,211. Both of them have the same average but the standard deviation is very different. While our first set is unreliable our second set is consistent.
In summary, the standard deviation is important because
- They are a key element of calculating the perimeters of error.
- It allows us the compare two different sets of data effectively.
- It shows us the risk by showing the range of data
Using standard deviation in real-life:
People use this statistic every day in their lives actually. It gives you a realistic opinion about the range in our data set so it is a popular formula. Some examples of standard deviation from our everyday life that might help you to understand the importance of this statistic:
When calculating the range to the average in a data set:
A class of students takes a physics test. The teacher finds out that the mean score on the test is 75%. Then the teacher decides to calculate the standard deviation to see if the grades student took were near 75%. Standard deviations turn out to be very small and that means student’ grades were very close to 75%.
A class of students took a test in English Language and Literature. The teacher decides that the mean grade on the exam is 55%. She is concerned that this is very low, so she decides to calculate the standard deviation to see how many students scored close to the mean. The teacher finds that the standard deviation is large. After closely checking all of the tests, the teacher is capable to conclude that numerous students with very low scores were the outliers that dragged down the mean of the entire class’s scores.
To find the outliers in a set of data:
A veterinary wants to learn if the cats on under his care are close in weight or not close in weight. He takes the average of the weight of all ten cats (the mean). Then he calculates the variance, and then the standard deviation. His standard deviation is remarkably high. This implies that the cats are of many different weights, or that he has a few cats whose weights are outliers that are distorting the data.
While determining the reliability of the data:
A store researcher is examining the results of a recent customer survey. He wants to have some measure of the reliability of the answers taken in the survey in order to predict how a larger group of people might answer the same questions. A low standard deviation shows that the answers are very suitable for a larger group of people.
A weather reporter is analyzing the high temperature calculated for a series of dates versus the actual high temperature recorded on each date. A low standard deviation would show a reliable weather forecast.
An employer wants to determine if the salaries in one department to see if it is fair for all employees, or if there is a big difference. She discovers the average of the salaries in that department and then calculates the variety, and then the standard deviation. The employer finds that the standard deviation is slightly higher than he expected, so he studies the data further and finds that while most employees make similar salaries, three old employees who have been in the department for 15 years or more, notably longer than the others, are making far more due to their continuation with the company. Doing the analysis helped the employer to understand the range of salaries of the people in the department.
When determining a standard measurement
In production design, designers try to develop general sizing systems to determine the variations needed in the size of the product to produce sizes for everyone. Shirts for example. Or baby seats for cars. To do that designers use standard deviation so they can calculate how much difference they should have in the product line.
Standard Deviation In Sports
Why do we need statistics in sports? Just think it like you own a team:
- Ability to examine your team’s average of parameters in the main panel graphics in the graphs that compare the statistical average of other teams in the league.
- Determine the up / down trends by comparing the statistics of your own match with the other matches of your choice
- To be able to compare your own match data with match data of other teams in order to detect tactical changes.
- To be able to see the offensive, defensive, passing, and one-to-one parameters of your team’s success/failure very quickly side by side with the ratios of other teams.
Sports teams use standard deviation. For example, a team that may be consistently successful may have a low standard deviation, but a team that scores a lot of standard deviation might be high.
One of the features of sports such as football is that you’ll know how a team did by the end of the season. When the season ends, you’re done. This in-built cut-off and an integrated fun factor allow sports to be a sandbox or testing ground for trying out different modelling strategies and coming to grasps with the real-world meaning of statistical constructs, including standard deviation.
In recent years, the use of statistical analytics in sports has been on the rise. ESPN (sports streaming platform) has even given an entirely new segment of its webpage to sports analytics. New metrics to measure an athlete’s performance are seemingly being developed on a daily basis.
Given this ongoing happening, it is interesting how the standard deviation of athlete performance has seemingly been ignored. There is very little real information about it on the internet.
A player that scores 15 points every game is simply not the same player that rotates between scoring zero and 30 points every two games, but averages, of course, would not discriminate between the two players.
The finance business can be used as a parallel example to show the importance of volatility. An investor might be most concerned with significant ratios when constructing his documents; these would be similar to today’s sports statistics of Point Per Game and Player Efficiency Ratings. The investor would concurrently take into account his fund’s volatility to guarantee it does not top his risk tolerance. Likewise, a sports general manager would want to create a team with an optimal amount of reliable consistent performers and less predictable performers.
Sports fans have observed underlying teams that have run off extraordinary playoff win streaks. A common characteristic of these teams is all had players that were based on the naked-eye performed rather a volatility. Maybe general managers who lack the foundational players to win a championship as a favourite should invest more in riskier assets the can offer surprise performances.
The Moneyball Theory
Have you seen the Hollywood movie Moneyball? The movie is about a baseball team (Oakland A’s) start using data to improve their games. It is basically the start of our sports world. Today, luring good players to the games is less important than collecting and process the data. Because data will show the managers were to improve this will improve the game and chance of winning the championship (assuming this is the ultimate goal).
This situation eventually created a theory called The Moneyball Theory. It gives answers to two simple things: Can the player hit? Can the player create runs? If the answer is yes then probably your team is in good hands.
This is the method Oakland A’s general manager Billy Beane used in 2002. He used the Moneyball Theory to select a team of underestimated players. He and his team ended up winning 20 games in a row. Between 2000 to 2006, the Oakland A’s went on to average 95 wins, win four American League West medals, and made five playoff displays.
Just because someone thought determining the standard deviation and process it other than just settle with the mean or naked-eye observation today’s sports world changed entirely.
How to calculate standard deviation in excel
Now, since you have the information on what standard deviation is and how to use it you can start learning how to calculate it on Microsoft Excel.
Excel is an office program that allows us to calculate the collected data. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications. But you can easily calculate the standard deviation with simple formulas.
There are eight standard deviation formulas in Excel. You can calculate the standard deviation of a sample or the entire data you have. Calculating sample only will not give you reliable results because they are estimated. But by using the entire populations you will have more reliable results because you have all the data you collected.
You can use different formulas for sample calculation and entire data.
While calculating samples, use STDEV.S, STDEVA, and STDEV.
And while calculating entire data (entire population), just use STDEV.P, STDEVPA, and STDEVP
You can distinguish these two types by looking at their last word. P stands for population data and S stands for the sample.
Sample calculations are common because sometimes it’s not possible to calculate entire data. And since you use sample calculations more often, let’s start with their formulas.
Tip: Write the formulas on line that says fx above the columns with letters.
STDEV.S – This formula is for the times when all your data is in numbers. It ignores the text and logical values while calculating. This available in versions after 2007. If you have a prior version you can use STDEV instead.
STDEVA – This is for when you want to calculate texts and logical values with numeric data. (In logical value 0 stands for false and 1 stand for true.)
How to Use STDEV.S or STDEV?
To use STDEV.S you enter a really simple formula:
STDEV.S(number1, number2, number3…) or STDEV.S([number1], [number2], [number3],…
Number 1: This is the necessary argument in the formula. This is the very first element of the sample. You can use a single array, named range, or a reference to an array instead of using commas.
Number 2 and number 3: This is the optional argument. You can add arguments up to 254 arguments in Excel.
Let’s say you have ten individuals from column A and row 1 to column A row 10.
You simply use the formula: =STDEV.S(A1:A10)
But what if you have a prior version of excel? You use STDEV instead of STDEV.S
In the same scenario of STDEV.S, you use the formula: =STDEV(A2:A10)
If you want to learn more about excel formulas check this Excel Practice worksheet.
How To Make Standard Deviation Calculations Manually?
Maybe you are not good with excel formulas or you find them too complicated to use. You can always manually calculate your data. Let’s how to do that step by step.
Let’s say you entered your data from column A row 1 to column A row 10.
- After entering your data to your excel sheet you should determine the deviation. It means value minus the mean. The formula for that is = A1 -AVERAGE($A$1:$A$10). The deviations values will be written in the B column starting with B1.
- Once you calculate deviation you will need the squared value of your deviations. So if you had negatives values in the first calculations they will all turn to positive values. For that, you will use the formula: = (B1 )^2. The squared values of the deviation will be written in column C starting with C1.
- In H5 you can calculate the standard deviation for the population with this formula: =SQRT(SUM(C5:C10)/ COUNT (C5: C10))
- In H6 you can calculate standard deviation for a sample with: =SQRT(SUM(C1: C10) /(COUNT(C1 :C10)- 1))
That is it. You can easily calculate the standard deviation on excel even when you have a large quantity of data. If you want to learn more about things you can do with Excel don’t forget to check our Complete List of Things You Can Do with Excel article.