Have you ever wondered where to start for being a successful data analyst? Here is an article that shows everything about data analysis.
What is Data Inconsistency?
Other than entering the wrong data into a database, perhaps, the most annoying and frustrating situation to come across is inconsistent data. It can seem like an amateur mistake for those who are looking from the outside. However, it is actually a quite common situation that even experts and professionals have to deal with.
What is meant by the term ‘data inconsistency’? What negative effects does it have on your workflow? How can you prevent it before it happens? And how can you recover from it? If you are interested in finding answers to these questions, you have come to the right place! Let’s dive into it.
Data inconsistency is discrepancy of the same data in the database. Data inconsistency means that at least two of the same data is entered into the system with a different format. In Microsoft Excel spreadsheet-based software, these data inconsistencies can occur from cell to cell. Two (or more cells) can seem exactly the same to your human eyes; however, they are not always the same if your data is processed by your computer.
Data Inconsistency Examples
Let’s give a few examples in regards to inconsistent data and data discrepancy in general.
Example 1: Periods
For this first example, try to find the difference between the two data cells:
Company Inc. Annual Profit = $22,051,998
Company Inc Annual Profit = $22,051,998
Can you figure it out? The number 1 example has a period (.) for the name of the company which is ‘Company Inc.’. Now, you can say ‘So what?’ due to the fact it registers as the same to your eyes and is processed as they are the exact same in your brain. However, they are not exactly the same to your computer!
They are basically the same company and forgetting to put a period to its name would not make much of a difference for you. But the software you are using (specifically Microsoft Excel in this situation) would not process them as ‘exact matches’. And that is a big loss for you.
Because these exact matches and the find or search functions built into the Excel software are time-savers. They will help you save time and energy, so you can focus on other things and continue with your workflow. But forgetting only a little dot can establish data inconsistency in your database and will hurt you in terms of time and workflow.
Example 2: Spaces
Unnecessary and extra spaces in cells are the defining enemies of the exact match function. They belong to a type of data inconsistency that is sometimes not even visible to the human eye. Now, this means that your eyes might not detect it with a glance due to the fact that spaces disappear with white backgrounds.
In order to demonstrate inconsistencies, the example below will be presented with quotation marks to both sides of the cells:
As it was mentioned above, without the quotation marks the difference between the two data cells would be close to impossible to identify. That single space blocks the exact match function to present you the matches of the word ‘Company’.
They seem the same, but they are not. And the processor of your software knows it.
Example 3: Capital Letters
This third and last example for this chapter is perhaps the most identifiable one here. The difference between uppercase letters compared to lowercase letters can be seen easily. But, again, that does not mean the computer will process them as the same data.
Let’s look at the example:
There are multiple buttons and features you can turn on and off in terms of upper and lowercase letters. Check those features and tick the boxes you prefer when you are searching for exact matches of any data.
The ‘Matches’: Exact, Close and Fuzzy
It can come as a surprise to you that although computers and digital devices are very good at spotting exact matches of the same data input, they are not as good as humans in terms of identifying ‘close matches’. Let’s look at the varying definitions of these match types.
An exact match refers to at least two data inputs that are exactly the same. And by ‘exactly’, they mean exactly. If you are dealing with letters or characters, every single one has to be the same. Even the locations of the spaces you use as well. If you are dealing with numbers, every one of them has to be in the same and correct order. And if you are using decimals, then, you should check (and double-check) the correct punctuations as well. Let’s examine the punctuation differences among the different languages and cultures in brief.
Punctuation Differences in Decimal Separations
The English language uses periods to separate decimals and thousands. That means if you are living in Great Britain or the United States of America, you are using periods for decimal and thousand separations. In addition to countries that have English as their main language, most of the Asian countries prefer to use periods in order to separate groups of decimals and thousands. However, not the majority of the world!
All of Europe, a big part of Africa and almost every country in South America use a comma in order to separate their decimals and thousands. So, if you want to keep your data consistent in your spreadsheets and workbooks in general, decide on one of them.
Imagine that you are a new start-up from the United States and you are presenting your database to a company from Indonesia that wants to invest in your company. Because you are from the United States, you will be separating the decimals and thousands with periods (.). And in the Indonesian language, commas (,) are used to separate groups of decimals and thousands. If you change those periods into commas, the Indonesian company can feel more ‘at home’ when they are looking into your data set.
The human brain and eyes are incredible tools in order to find a pattern. Humans can search for similarities, find them and continue them if they want to. And you would expect that computers and digital devices accomplish this processing activity better than humans. You would think that computers process data perfectly and can find the similarities among and between groups of data easily… In fact, that is wrong!
Surprisingly, humans are better at finding similar patterns and ‘almost exact’ matches than computers processors. And the reason? It is because humans are just humans. Humans do not process binary similar to a computer, we are ‘visual processors’. This leads to humans who are looking at the clear big picture can find similarities and differences easier than a computer.
This can be looked at as an advantage, but due to the fact that we function from our daily life to workflows in the digital world; it is actually a disadvantage that computers cannot process similarities among data as successful as humans do.
A close match is an ‘almost equivalent’ data. The similarity between two (or more) data sets that are ‘close matches’ can range between 50% to almost 99%. However, keep in mind that close matches are never 100% matches. These one hundred percent (100%) matches are called exact matches as you can find the definition right above.
The difference between a fuzzy match and a close match is that fuzzy matches take into consideration the context of the data as well. For example, you are entering data for your retail department and want to look for matches in it. A word (or numerical data) you search in the retail department group can give you a fuzzy match because the context is the same. However, if you search data related to the retail department in different department spreadsheets or workbooks, you will get close matches. Because now, the context would be lost.
The similarity percentage for the type of fuzzy match is the same for close matches. It fluctuates between 50 and 99 percent. However, in some of the software and applications used in finding ‘matches’, you can adjust the percentage. You can say that ‘I want to see the matches in the 25% area as well’. In order to do that you need to find the match adjustment settings and lower the similarity percentage if you want to see broader similarities.
The Disadvantages of Data Inconsistency
Now that you understand the details of data inconsistency, it is time to look for some of the effects of it. This chapter will focus on especially the disadvantages and the negative effects of data inconsistency. Let’s see how data inconsistency hurts your time, your tidy spreadsheets and database.
The Mix-Ups in Summarizations
One of the most useful tools and functions of Microsoft Excel and its subsequent templates is the ability to generate reports. These reports provide you summaries and details automatically. It comes in handy because it generates all of these reports depending on your needs in a matter of seconds. Otherwise, you would be dealing and struggling with these analyses and summarizations manually. You would be spending your precious time on manual summarizations, but do not worry, Excel has its summary and analysis functions! However, does it provide you accurate reports?
It depends on you! The accuracy of the summarizations actually depends on the data you enter into the database. Let’s use the same example featured in the first chapter above.
You have entered only two cells of data:
Company Inc Weekly Loss
Company Inc. Weekly Loss
You want to find the exact number of uses of the name ‘Company Inc.’. Therefore, you search for ‘Company Inc.’ using Excel’s various functions and formulas. You are certain that there are two cases of the word ‘Company Inc.’ Because you can clearly see the two data cells. But your software says that you have only one case of ‘Company Inc.’ in your spreadsheet.
The period (.) you forgot to put into the first data cell can cost you accurate results when you get the summary. But this example can be fixed easily, right? Just put a period in the first cell and that is it… But now, imagine you have thousands of rows and columns of cells filled with large amounts of data containing letters, words and sentences. And imagine you have thousands of different company and department names. Can you fix that easily as well?
Time Consuming Corrections
Of course, fixing your data in order to make it consistent throughout your spreadsheet is time-consuming. And that is one of the biggest negative effects of data inconsistency to your workflow. It makes you take time from your crucial work to fix past mistakes. It forces you to waste your time because of technical errors of individuals.
Let’s give an example on how corrections on your inconsistent data can become time-consuming:
Imagine you have to use the name of the United States of America in your database. Now, how would you approach to use this name? You can use ‘the US’ as an abbreviation; keep it as ‘the United States’ or use another abbreviation like ‘USA’. The choice is yours. You used each of them throughout the Excel spreadsheet. Now, the country is the same; however, the data input is not. Therefore, when you search for one of them, you are not going to get the others.
The solution for this is to go back to every time you use a version of the name and fix it. By fixing, you need to decide on one of the versions of the name and use only that version. Otherwise, this can turn into a huge nightmare, if additionally, you are working on the same spreadsheet with other coworkers. In order to avoid any confusion, before you enter the data, determine a version to use.
Transfer and Integration into Other Data Systems
Different software, various applications and programs have their own systems for you to put in and process data. That is why it becomes almost impossible to transfer inconsistent data into another system and expect to work it perfectly.
Let’s say that you are running an e-commerce website which is a retail store. You want to upload your database in a .xlsx format to an ERP system. The clothes you are selling come with an extra-large option as well. Now, how are you going to present that option? If you transfer your database and look at each of the clothes and their size options, you can see ‘XL’, ‘XLarge’, ‘Xtra Large’, etc. The different versions and inputs of the same data become a mess when you upload it to another system.
If you want to avoid chaos, decide on only one version to utilize. In addition to that make sure every employee who can add and update the database knows which version is chosen.
How You Can Prevent Data Inconsistency in Your Excel Spreadsheets
There are multiple ways you can prevent creating an inconsistent database before and during the process of entering it. Some are basic and simple and some require you to sacrifice some concepts in other areas. Let’s look at some of the methods which you can stop inconsistencies and discrepancies with:
Limiting Manual Data Input
There always will be manually entered data into databases. There is no way around this. However, the action you can take which is going to help you to keep your text consistent is to limit manual data inputs.
Because humans make mistakes. Employees who enter data, no matter how much an expert or how professional they are, are going to make mistakes. That is one of the unfortunate disadvantages of the ‘free text’ input method.
By reducing an individual manual input and letting the automation process take over, you are going to diminish the number of mistakes as well. And this applies to every party. The company (perhaps the seller), as well as the customer (the buyer), has to take into consideration that automation of data input will help to eliminate mistakes from the spreadsheet as a whole. And do you want to take a guess on one of the most made mistakes? Inconsistency.
This way of eliminating data inconsistencies actually is the lack of flexibility, not more of it. Back in the day and still, in some areas, every customer and company would fill out physical forms for every transaction. The pen and paper method of data input makes this process more vulnerable for mistakes. Thus, inconsistencies.
However, limiting flexibility and forcing companies and customers to choose from a ‘predefined’ set of data reduces the chance of mistakes. The more of a ‘free text’ an individual constructs, the more the database includes mistakes and inconsistencies.
Excel Templates Ready to Use
Third-party Excel templates take some of the responsibilities away from you so that you can focus on your data. Majority of the templates establish automation for entering and processing your data. That way the risk of inconsistencies decreases.
For example, most of Someka’s Excel templates feature an additional ‘Settings’ section. You can adjust your data and keep it consistent in a much easier way than manual adjustments.
How You Can Fix Data Inconsistency in Your Excel Spreadsheet
You have tried but you could not prevent making mistakes and establishing an inconsistent data set. Now, you are going to try to fix it after the fact. In the end, it is still your computer or digital device processor to identify the mistakes and inconsistencies. But it needs a human hand manually feeding the correct data and adjusting the settings.
The computer which is processing and identify the consistency of your data has to be able to identify the difference between a ‘variation’ and a ‘mistake’. Let’s go on by using the ‘Company Inc.’ example here as well:
‘Company Inc.’ and ‘Company INC.’ terms are the same and mean the same thing: The company name. Now, you have to feed the necessary data into the software to say that they are the same thing. You need to establish the different versions and variations of Company Inc. in order to make your data consistent and reports accurate.
However, if you put in ‘Company Inc.’ and ‘Companyy Inc.’, then, what happens? It is obviously a typo and a critical mistake. If this goes unnoticed by you, it is not supposed to go unnoticed by the computer as well. Then in this example, you need to feed in the data that other than the word ‘Company’ every variation of the word is a mistake.
Yes, it is not a very clear line that is drawn here. And you can move this line if you want to adjust what constitutes as a ‘variation’ of a data and what constitutes as an ‘inconsistency mistake’. It is hard to establish concrete rules and settings for everyone to abide by in terms of data consistency. That is why you need to set your own rules and decide on your principles on how to put in and process your data.
This blurry line is called ‘fuzzy logic’ in the fields of mathematics, computer engineering and in today’s world, also in the field of artificial intelligence. Fuzzy logic takes the values between a binary 0 and 1, and merges them. Therefore, making every variable between the value of 0 and 1, only one variable. This is applied to numerical data in the fields mentioned above. However, it is possible to use this fuzzy logic in data cells that uses letters and words as well.
But the important part and the essential action you need to take is the adjustment. You need to adjust and enter your own rules and your principles for the points of 0 and 1. Due to the fact that everything between will become ‘one’, it is important for your database and sets of data to establish the limits. With that adjustment, you can merge your texts with similar attributes (close matches and fuzzy matches) into one cell, spreadsheet or even a Microsoft Excel workbook.
Not every identification of an error is up to the computer which is processing your data. And also, finding all the mistakes and inconsistencies in your data is not an action you can 100% be in charge of. The ideal is the mix of the two.
By reducing manual inputs and stopping mistakes and inconsistencies from happening, you would be allowing the computer to function more properly. You are helping the processor in order to generate accurate results based on your data for you. You are also helping your employees and coworkers who are working on the same spreadsheet. And, of course, you are helping yourself as well.
It is sort of a chain of positive reactions. If you are more careful about how you input your data, then you can avoid data inconsistencies. If you make less inconsistencies, thus, less mistakes, you would be saving time and energy. The prevention of data inconsistency problems in your Excel spreadsheets starts with you and it ends up aiding you.