Have you ever wondered where to start for being a successful data analyst? Here is an article that shows everything about data analysis.
Imagine two flower-seller who grow rose in their rose garden. They want to find out who is better at getting more roses to grow in their garden. One of them Jack says, “I have only 5 rose branch but this year I picked 30 roses! and the other one Mary replies, “You are not good at this, I picked 100 roses this month!
If you just compare Jack’s 80 roses and Mary’s 100 roses, it does seem like Mary is the better gardener. But what if you should think that Mary had to plant a twenty-five rose branches to get that harvest?
Just “normalize” the data to get a more fair way of comparing these two. Instead of counting the total number of roses, let’s compare both gardeners with a fair standard – how many rose did they each grow per branch?
Jack grew 30 roses on 5 branches. 30/5= 6 rose per branch.
Maria grew 100 roses on 25 branches. 100/25 4 rose per branch.
Now that you’ve normalized the data, you can clearly see that gardener Jack is much better!
What Is Normalized Data?
Normalized data is the data transformed to fit within a certain and usually simpler range. Let’s say we have a list of values ranging from y to z.
y: the minimum numerical value in the current list
z: the maximum numerical value in the current list
We want to simplify this list by making it range differently such as ranging from a to b.
a: the minimum numerical value that we want to see in the new list
b: the maximum numerical value that we want to see in the new list
To establish the formula, let’s say the list begins with a number X.
X: the first value appearing in the list
The formula to normalize the value X is;
X_normalized= ((b – a)* (X-y) /( z-y)) + a
After establishing the formula for the first value X, we can duplicate it for the other cells to normalize all values in the list. Hence, at the end, we get a simpler range of data which is easier to read and understand.
This is how the list will basically become after the normalization calculation:
What’s the difference between Normalization and Standardization?
Normalization and Standardization are concepts that are often confused with each other.
We can normalize a list of values via the calculation above, however, if we want to standardize values, the excel has also its own formula to calculate it, called Standardize.
While normalization transforms the original values to fit within a certain range, standardization transforms them to fit within a distribution that has a mean of 0 and standard deviation of 1. This operation is also called getting Z-scores or Mean Centering.
The calculation of Standardization is quite easy. Let’s say, we again have a list of values ranging from y to z.
Let’s say the list begins with a number X.
X: the first value appearing in the list
The formula to standardize the value X is;
X_standardized = (X – mean of range) / standart deviation of the range
The excel formula for this calculation is;
=STANDARDIZE (X, mean of range, standard deviation of the range)
So obviously to write this formula, we also need to know the mean calculating formula and standard deviation calculating formula.
Mean calculating formula :
=AVERAGE(range of values)
Standard deviation calculating formula
=STDEV(range of values)
Methods Used to Normalize & Standardize Data:
Normalization can be used mainly in 2 ways:
- to simply a range of data to make it easier to understand and assess.
For instance; let’s say we have a list of a class of 10 students’scores on the Math exam. The scores are ranging from 0 to 100, but we want them to range from 0 to 1 so as to assess it more easily. When we make the calculation we get the scores like this: (Please click to use the excel sheet and understand the formula properly.)
Thanks to normalization, we can deduce that the more successful students are Jason and Mike.
To crosscheck the calculation, we can make graphs and see the line graphs of both columns have the same trends (but different ranges).
- If you compare more than one set of data with different ranges.
For example; let’s say we have a list of a class of 10 students’ scores on the Math exam and Physics exam and we want to compare who is more successful on what. However; the Math exam was scored out of 100 and the Physics exam was scored out of 50. Since the ranges are different, the assessment out of original numerical values may be confusing. When we normalize the scores, we get:
Now, we see all scores as if the exams were scored out of 1.
When we make graphs from both original scores and normalized scores, we see that original data is misleading because, for example, Jason looks like he is better at Math, however actually he is better at Physics. Also, Harvey’s success on both exams is alike, however, actually, he is way much better at Physics than Math.
Standardization can be used mainly in 2 ways:
- You have a list of different variables whose mean and standard deviations are certain numbers. However, you want to simplify it with a mean of 0 and standard deviation of 1.
For instance; let’s say we have a portfolio of 8 stock exchange accounts with a mean of 23,5 and standard deviation of 22,1. When we standardize the values, it is much more convenient to read and evaluate:
Standardization tells us the standard deviation of the value from the mean. If a value has a negative standardized value, it means its value is less than the mean. Conversely, if a value has a positive standardized value, it means its value is bigger than the mean. For example; here Walmart has 0.610 standard deviations below the mean (since it has a minus) and Apple has 1.513 standard deviation above (since it is a positive number).
- To make it simple to evaluate fluctuations of different data sets if they have different means and/or standard deviation.
For example; let’s say we have monthly values of 2 different stock exchange accounts. When we standardized the data, we get;
Why Should You Normalize Variables
Normalization and Standardization are great tools especially when we are dealing with a large number of data or data with different scales because they help to make the large data to shrink into a smaller size and make it easy to make deductions and use the data correctly.
If data analysis, cluster analysis, or statistical modeling is a part of your job, these two major data pre-processing techniques can help you to form predictive models and detect relations and correlations between data sets if there is any.
All in all, whether you want to make a presentation to the top management of your company in a particular topic or you want to reduce the size of the data that you are dealing with to make clear cut deductions for yourself, consider using normalization or standardization functions of excel.
You can find some other easy feats excel tricks in our other blog posts, please click here to read.