Have you ever wondered where to start for being a successful data analyst? Here is an article that shows everything about data analysis.
When you are researching something it is ideal to make research that includes every individual in that population. But it is almost impossible if the population is large. In this case, you choose a little group from that large population and work on that.
The result you got from the sample you set gives us a clue about the whole population but it would cause errors to act like these results are the same for the whole population. Because of this, it is more correct to give the result in a range rather than give it as a single value and say that most of the population will be in this range. This interval is called the confidence interval.
- It is calculated for odds of and risk factors. This is why it is commonly used in engineering and medical areas.
- The closer the lower and upper limit of the confidence interval means the narrower the confidence interval. It means the prediction of the entire population is more accurate.
- The confidence interval is calculated by using standard error. A lower standard error means a more accurate confidence interval.
- The larger population sample means more accurate results because there are more individuals you calculate and this makes our sample closer to the entire population.
Why Do You Need Confidence Interval?
When you are making researches on large groups it is really hard to collect data from each individual in this population in terms of time and resources. So instead you pick a smaller group you believe represents our whole population the best and make our calculations based on that group. This group is called the sample size.
For example, you are designing a new running app for runners. It is impossible to track down every runner in the country and ask them their reviews on this app. Instead, you choose a small group of people that you think represent the whole population best work from there.
Confidence intervals show us how much our sample mean might differ from the population mean. It gives us that result in a range. This range has an upper and lower value. This value allows us to guess more accurately while you read the calculations.
Another good thing about confidential intervals is that you can calculate depending on how much you want to be sure of your results. Default accuracy is usually 95%. That means you can be 95% sure that the confidence interval from the sample contains the population mean.
If you want to be more definitely you can calculate a 99% confidence interval. That will, again, mean you can be 99% sure that the confidence interval of your sample size contains the population mean. Only 1% of the population will not contain the population mean in this example.
These levels of confidence tell us if you make the research over groups of 100, 99 of the individuals in these groups will have the same result. If you do them in groups of 1000, 990 of the individuals will have the results, and so on.
While you calculate confidence interval with a sample size something problematic might occur. The sample size’s mean might be too similar to the population mean or the sample size’s mean might be very different from the population mean. That is called the sampling error and you will see how to use it while calculation confidence interval in the next chapter of this article.
Where Do You Use Confidence Interval?
As it is mentioned before, confidence interval commonly used in medical and engineering areas. So let us give you an example of that.
Let’s say you are head of a pharmacy firm that produces medicine for the diabetic. You and your team managed to eliminate it two formulas that are going to be used on patients. Now in an ideal scenario, you give medicine X to half of the patients and medicine Y to the other half so you can evaluate them correctly. But it is impossible to manage it for the target population is so large you choose only a group of patients that you think represent the whole population better and give them the medicine.
The results will give us a good point to guess which medicine is working better but since you could not try it on the whole population you cannot stick to one result. So you gather up all of our information from this experiment and calculate the range. This is the confidence interval and gives us a better idea on the medicine.
Let’s say after a few months of giving these medications to patients you noticed the healing rate among these patients is 20%. Some of them are getting 60% better and some of them get 40% better. In our 95% confidence interval, the lower confidence interval is 40% and the upper confidence interval is 60% percent. This means if you try the same with 100 other groups you might reach the same result in 95 of them. And you can be 95% sure the healing rate among these two medications will be between 40% and 60%. If the limit between them were narrow that would mean you were more successful.
How To Calculate Confidence Interval Mathematically
You may not be good at math and you don’t have to be to calculate the confidence interval for it can be calculated quickly by following a few easy steps. Let us show you through the calculating process. We strongly advise you to take your time and calculate with us so you can understand it better.
Let’s say you are veterinary and you have to buy a transport cage for cats in your clinic. Every cat has a different weight of course so you have to guess the weight of cats you might take care of in the future. To do so you can weigh all the cats nearby but in fact, it would be very dreadful and long works for there are hundreds of cats everywhere. It brings us to our first step:
- Choosing a test group for your calculation. You can choose ten cats to weigh that you think suits your experiment better. It is easier to work on a less crowded group. The raw calculations won’t do any good for you but you will process them together in the other step.
The ten cats you choose have different weights. Let’s say our data set is 3, 3, 3.5, 4, 4.5, 3.7, 3.3, 4.2, 3.8, and 4. These are kilograms of the cats you weighed.
- Calculate the sample mean. Mean is a very easy statistical term to calculate. It gives us the average value in a data set. All you have to do is add up all your data and divide them into the number of your data points. In our example, it will be 37 /10 which will give us a result of 3, 7. Our mean is 3, 7 for this data set.
- Calculate the standard deviation. To calculate the sample standard deviation, you need to find the mean of the data which you did in the second step. Next, you need to find the variance of the data or the mean square of their difference from the mean. Once you’ve found this number, take its square root.
It might sound complicated but you can use a calculator for this step. The result of this calculation will be 0, 5011. Our standard deviation value is approximately 0,5
- Choose the desired confidence level. The most common confidence levels are 90%, 95%, and 99%. You are going to choose 95% in this example.
Note: You may remember these levels from our z score graphic. If you don’t know what z score is check our article How To Calculate Z Score in Excel for the graphic and more.
- Calculate the standard error. That means multiply the critical value by the standard deviation.
To calculate the critical value you turn 95% (our confidence level) to 0, 95. By dividing it into 2 you will get 0,475. As we reminded you before these parts concern the z score. So open yourself a z score chart and you will see that the closest value is 1.96 at the intersection of 1.9 rows and 0.06 columns.
Our confidence value is approximately 0, 31
- Calculate the Confidence Interval by adding the confidence value to the mean and subtracting it from the mean.
The lower limit will be 3, 39 and the upper limit will be 4,01. Now you can guess the weight of the cats and buy the correct transport cage.
How to Calculate Confidence Interval In Excel?
Now that you learned what Confidence Interval is, why do you need it, how can you use it and how to calculate it mathematically, you can start learning how to use this function in excel.
Microsoft Excel has a lot of different formulas for calculating various features of data you enter. It has one for confidence interval too. There are two ways to calculate, you can use formulas and that requires you to write down a different formula for each value you want to calculate. The second way is by using some settings and Excel calculates everything you need automatically. We will show you how to do both of them step by step.
How to Calculate Confidence Interval Manually?
Step 1: Write Down Your Data Set
What do you want to calculate? How fast a fast food restaurant delivers the food or maybe the dosage of the medicine you use on your patients? Let’s start with something simpler. You are learning how to make cookies but you never seem to arrange the right portion of sugar the bake a perfect one. So you noted how much sugar you use every time you try to bake and now you are trying to find the right portion.
Let’s say the portion of sugar you used for your last ten experiment goes like 100, 52, 54, 67, 73, 86, 40, 102, 88, and 90. These ten data points are written from column B and row 2 to column B and row 11. We wrote “Experiments” on column A, row 1, and “Portion of sugar (gr)” to column B row 1.
Step 2: Interpret your formula
The Microsoft Excel formula for the confidence interval is simply:
=CONFIDENCE (alpha, standard deviation, size)
This means you need to determine three different statistics before you calculate the confidence interval.
- Alpha: “Alpha is the point level which is calculated as 1 – confidence level; a 95% confidence level has a 0.05 significance level.” It means the ratio of the individuals that does not contain your population mean.
- Standard deviation: Standard deviation is a statistical term that shows us the range between the mean of the population and the actual data.
- Size: Size is the population size. The number of individuals you entered in your data set.
Step 3: Calculate the Required Values
Now you will calculate these three values.
- The confidence interval is usually calculated with 95% confidence value and you will stick to that in our example. You can change it if you wish. So the alpha value for us will be 0,05. You will write it in cell E4.
- To calculate the standard deviation you have to determine the mean first. So we are going to type “Mean” is column D, row 2 enter the formula of it to formula space above our sheet. The formula for calculating the mean is:
Our mean is 75, 2 for this example. This will be written in cell E2 in our worksheet.
After calculating the mean you can start calculating standard deviation. It is by subtracting all the individuals from the mean and squares the results. Add up all these squared numbers to each other and divide them to one less of the number of individuals in our data set.
All the calculation can be hard to do with large data sets so Excel created formulas for that.
I am going to write “Standard deviation” to column D, row 3.
If you are using a version of Excel after 2007, Excel the formula should be:
=STDEV.S (B2: B11)
But if you use an old version, you use the STDEV formula. This is:
=STDEV (B2: B11)
The standard deviation for our example is 21, 42584. This will be written on cell E3 in this example.
Tip: The formula you type might not work on excel due to some version issues. But don’t worries Excel correct the formula for your version!
- The sample size for our example is 10 because you have 10 individual data points. We typed this in E5 cell in our example worksheet.
Step 4: Place Your Calculations into the Formula
Let’s put these values into our formula. It should look like:
=CONFIDENCE (0.05, 21.42584, 10) or
=CONFIDENCE (E4, E3, E5)
Standard deviation: 21, 43
This formula will give us a value of 13, 27963 which is approximately 13,28. That is our confidence value.
The confidence interval is added to and subtracted from the mean. Which is
75, 2 + 13, 28 = 88.48 the upper CI
75, 2 – 13, 28 = 61.92 the lower CI.
How to Calculate Confidence Interval Automatically?
First of you need to activate the Tool Pak in Excel. To do that click the button “files” and from there “options”. There will be an “Add-ins” section in the left column. When you click it a page will appear. At the bottom, it will say “Manage: Excel Add-ins” and you will see a button says “go”. When you click it a new window will appear. This window will show you available add-ins. These are tools that help you with statistical or engineering analysis.
There are two tools you need to activate on the Tool Pak: “Analysis Tool Pak” and “Solver Add-in”. You must tick them both to activate and then press the button “OK”.
Once you do that go to the Data tab and there you are going to see a “Data Analysis” button. The next step is clicking it. A new window with an analysis tools list will appear. Go and select the “Descriptive Statistics” option.
Doing this will bring you a new window with options. The first option will require you to enter the data set on your sheet. You might type $B$2:$B$11 or you can click it and select the cells with your data in it with your mouse.
Note: The dollar sign ($) before the column letter and the row number is used to fix the cells. You lock them by using the dollar sign.
The output options allow you to enter your analysis anywhere you want. It can be the very sheet you work on, another clean sheet, or an entirely different workbook. We recommend using the “New worksheet” button for a separate worksheet from the one you work on.
Under that, you tick “Summary statistics” and “Confidence level of mean”. As it was said before you can change the confidence you want to calculate as you wish but we decided to do this example with a standard 95%. Once you finish doing these click the “OK” button to run the analysis.
Now that you have our analysis in a new separate sheet. This tool gave us the values of mean, standard error, median, mode, standard deviation, sample variance, range, minimum, maximum, confidence level %, and more. Of course, you won’t use all of these values but it can be really useful to access all of these values quickly and simple.
I gave us the confidence level (95%) but you can also calculate the upper confidence interval and the lower confidence interval. To do so you add the mean to the confidence level. The formula is going to be:
=the cell with the mean value + confidence level value cell
=B4+B7 for example
When subtracting the confidence level from the mean it will give us the “lower confidence” interval. The formula should look like:
=the cell with the mean value – confidence level value cell
=B4-B7 for example
The confidence interval is a helpful and useful statistical term. It makes it easier to calculate the results in a large group of data. It saves us time and it is really easy to calculate by using Microsoft Excel. Don’t forget to check out our other articles on how to use excel formulas.