P-Value might be hard concept to understand. This blog post aims to teach how to calculate P-Value in Excel with examples!
What is Data Analysis?
Data analysis is the ultimate process of gathering information from data and evaluating with a logical, analytical approach. This process includes several phases as data collection, data processing, data cleaning.
This concept could be confused with data science or business analysis. Looking at the common names they include, it is understandable. Data analysis takes a place like a bridge between data science and business analysis. Data science tries to find the right questions to ask, data analysis finds answers to questions and business analysis uses insights from data analysis to be leading to business decisions.
To be more clear;
Data Science is a general term also encompassing data analysis. Data scientists create models and algorithms to tackle big data using multiple tools. Besides data analysis, it could refer to data mining, machine learning or software development.
Business Analysis is responsible for turning existing data into strategic business ideas. For business analysts, data is a tool to get insight from. Even though technical knowledge is necessary; soft skills are more involved here; like problem solving and critical thinking.
Data analysis, however, is the very phase where insights are being created to solve problems. It takes place under data science, and supports business analysis with information.
So, what does a data analyst do?
A data analyst is the very person who handles the ultimate process of gathering data, performing statistical analysis of datasets, developing solutions and building models for strategic approaches. A regular day of a data analyst is about shifting through big datasets and extracting data to develop analytic models. They first understand the trends and therefore, by analyzing data, provide critical insights to the company they work for. Because companies hire them for that in the first place; so they could make critical decisions based on that information, and improve their business.
Work means data; and in today’s world data collected in a workplace could be incredibly huge. If you are a company, it is inevitable to face massive data and not to know what to do with it. If you are a company aiming to be better, then complicated datasets have so much to offer you. To get benefits from them, you need a professional called “data analyst”.
Data Analyst Duties – Responsibilities
Now you know what a data analyst does in general terms. Their main responsibility is collecting and analyzing data, yet what they do under job description may differ from company to company. Let’s see together what data analyst duties might be.
- Collecting and interpreting data
- Performing statistical analysis
- Testing and correcting errors
- Identifying data trends
- Writing SQL codes
- Designing business systems
- Giving insights and recommendations
- Identifying data issues and propose solutions
- Providing technical consulting on data systems
- Improving data performance
- Developing KPIs for strategic decisions
- Mentoring other data analysts
- Working with business analysts and business partners
You are planning a career as a data professional but you are wondering, what is your title going to be? What type of job are you looking for? Data science could intertwine with other professions and disciplines. So it is not possible to mention one single job title or the same amount of salary. The list can go on; but here are some most common job titles besides data analyst and the estimated salaries in data science today.
As distinct from data analysts, data scientists also build machine learning models. They work experimentally on data they relate to, and that’s what makes them scientists. A data scientist takes care of the data technically and statistically. They interpret the data, process and manage with technological methods. This role requires serious IT knowledge.
Data engineers give a structure to what data scientists do. They build data pipelines and infrastructure of a company data.They turn data into something functional. Data engineers can be considered as backup to data teams. When a problem occurs, data engineers are there to fix. Their software development skills are more prominent than statistical skills.
Business analysts are the people who can use the provided data and insight in an effective way. They take data and turn them into business strategies. They might need data science skills in some cases; however analyzing data with detailed technical methods is not their thing. They guide companies to improve their business with the right actions.
Artificial Intelligence Architect is like the chief of the data science team; these people understand the needs and decide the best solutions possible to implement; by leading other team members to gather the right information. They know well how to use the theoretical information in practice and handle the issues. This role requires mostly problem-solving skills.
AI Product Manager
This title is relatively new right now. Product management is something we have been familiar with for a quite long time; but artificial intelligence recently started to be required into products. AI product managers are in charge of products with AI, they help companies with the product adoption processes. Critical thinking is the key qualification for this position.
Average Monthly Salaries for these jobs are as follows.
- Data Analyst: $5,608
- Data Scientist: $10,000
- Business Analyst: $6,860
- AI Architect: $10,760
- AI Product Manager: $9,714
- Data Engineer: $10,800
Skills Needed To Be Data Analyst
Being a data analyst requires both soft skills and technical skills. If you want to be a data analyst, you must learn certain tools while sharpening your soft skills. Let’s see what key skills are needed to be a data analyst at its best.
If we will talk about technical skills; we should start with programming languages for sure.
- R: is a free open-source software used for especially statistical analysis of data. It is commonly used by statisticians, and also data scientists in terms of data mining or machine learning.
- Python: is maybe the top programming language for data analysts. What makes it preferable and suitable for multipurpose is that you can use Python for not only web development, but also other programming and software development types.
- SQL: which stands for “Structured Query Language”, is a common programming language designed for managing data by accessing, reading, manipulating and changing the data.
- MatLAB: is not the most popular tool when it comes to data science; however it would be definitely beneficial to know its language because of its toolbox or machine learning algorithms that can be used for data analysis.
Statistical analysis is the vital part of data analysis; as we mentioned so far. Naturally, such analysis requires related skills that we are about to list. It is important to note that the statistical skills you might have to need differ according to your job. Not for every possible data analyst position you need detailed knowledge.
- Basic statistical elements: mean, median, mode, standard deviation and variance, hypothesis testing.
- Statistical methods for data mining: analysis of variance, multivariate logistic regression analysis, nonlinear regression analysis, discriminant analysis, etc.
- Database principles: data model, database design.
- Types of data distribution: standard normal, exponential/poisson, binomial, chi-square.
- Tests for significance: Z-test, t-test, Mann-Whitney U, chi-squared, ANOVA.
- Quantitative methods: time axis analysis, probability model, optimization.
Analyses and mathematics can’t be considered separately. Analysis needs maths to be performed properly. Think about the data analysis concept for a minute; keystones of the process and almost each phase requires mathematical understanding.
- Problem Solving: Data, especially big data, come with problems to solve. Sometimes those problems get really complicated. That’s why one of the most important qualifications of a data analyst is problem-solving skill. This is a strong asset to have while dealing with bugs or errors on the way.
- Linear Algebra and Calculus: Linear algebra methods are used in machine learning, with vector and matrix. Calculus on the other hand, is used for optimization of the problems and built some functions.
Data Visualization Skills
Creativity makes most things better. Creating data visualizations makes it easier to understand the data. Data science is a complicated topic; especially for those who are not familiar with the concept. As a data analyst, visualizing data is a nice-to-have skill. When you perceive a visual, rather than trying to read numbers, it is way easier to catch critical points and interpret things.
Those creative works are amazing. Another way to visualize data is to create dashboards or reports. When you have such data to gather in an organized way and you want them to be shown in a visual manner; chart-dashboards are what you look for. Check this Sales Dashboard Excel template and this Heat Map Generator to see how such chart-dashboards look like. When you put so much data in the process, those chart-dashboards are life savers. Creating dashboards/reports is a good skill for a data analyst when it comes to data visualization.
Google Analytics is a popular web service designed by Google, for tracking and reporting the traffic of websites. It is a great platform for managing marketing activities and search engine optimization. For data analysis, Google Analytics provides its users with detailed insights and even reports about their website traffic, so that they can improve ROI of their websites. This service is not complicated at all. You can learn how to read insights easily in a very short time. So, Google Analytics is a skill that can be improved quickly for a data analyst.
Not necessarily you will use machine learning in your data analysis experience; but a little knowledge kills nobody after all. It still is a great gain to know about machine learning for someone who works in data science. Understanding it may open quite many doors you don’t even imagine. You can have a competitive advantage by having even a general idea in the job searching process.
While machine learning has a focus on the idea that machines can learn things and adapt; artificial intelligence adds a broader perspective to that: machines can get smart. They can do smartly whatever they do. Artificial intelligence, with its simplest definition, is the intelligence demonstrated by machines and their ability to rationalize. In data analysis AI refers to the collection of technologies that helps data analysts to extract insights from datasets. You can learn how to effectively use AI in data analysis and start getting insights.
Writing / Communication Skills
Soft skills are great supporters to the technical ones. Because no matter how well-trained on IT you are, as long as you are not capable of expressing yourself; data has no point. Findings must be addressed to the colleagues clearly and misunderstandings should be clarified afterwards if any. A data analyst should be communicative not only in terms of explaining complex data, but also listening to people they collaborate with. Ensuring the overall communication is highly necessary in a job like data analysis.
Strong Microsoft Excel Knowledge
A well-known, multi purpose tool such as Excel maybe doesn’t come to your mind while speaking of complex and hard-to-understand stuff like data science. You might expect to hear about new technologies and detailed programs. Some of you probably think “what does Excel have to do with it?”. In fact, Excel loves data and has its own programming language. You can do more things with it than you think you can. Your Excel knowledge may suddenly become a data analyst skill. You need to know some fundamentals to use Excel in data analysis.
Here are some of the important features and functions of Excel.
- Basic features: print, query, filter, sort and simple table data processing.
- Functions and formulas: common functions, array formulas, multiple lookups, advanced data functions like sum, if, concatenate.
- Visualization options: advanced charts, maps, graphics, chart plugins.
- PivotTable, VBA program development.
Job Interview Questions for Data Analysts
Wondering what you will come up against in a job interview as a data analyst? You probably prepare yourself as best as you can but if you get a general view about what possible questions they would ask, the interview experience will be smoother. There would be both generic questions to examine what type of data analyst you are, and specific questions with certain details to examine your logic and thinking. Some of the top questions for a job interview are listed below.
- What is your motivation to be a data analyst?
- Which particular area you would like to work in and why?
- Which software and programming language are you trained in?
- What would you do with missing data?
- Which part of a data analysis project do you enjoy working the most?
- Explain “Data Cleansing” and best ways to practice this.
- Which statistical methods you would prefer to use for an analysis?
- What are the characteristics of a good data model?
- What are the possible problems you encounter while performing data analysis?
- Which tools do you use for creating dashboards?
- How would you estimate how many jeans could be sold in New York every March?
- Explain the difference between data mining and data profiling.
- How would you use Excel as a data analysis tool?
- How do you handle messy datasets?
- Which data analyst skills do you believe you have?
There are some must-read books for people in data science. Here are some of the best books for data analysis listing below.
Data Science from Scratch, by Joel Grus
Great book to start learning about data science. You will find introductory information in Data Science from Scratch and get quickly familiar with concepts like algorithms, statistics, libraries etc.
Practical Statistics for Data Scientists, by Peter Bruce and Andrew Bruce
This is another beginner level book for data analysts. It gives readers the principles of statistics and how to use them in data analysis. You can learn statistical methods in data science in depth by reading this guiding book written in a very easy language.
Python for Data Analysis, by Wes McKinney
This book approaches data analysis from a Python perspective. You will both have ideas about data analysis in general, and detailed information about Python features.
Data Analytics Made Accessible, by A. Maheshwari
If you are a newbie to data analysis and intend to learn more; this book will serve your purpose. This book teaches data analytics with real-world examples. Advanced level readers will also benefit from this book as much as beginners.
Big Data: A Revolution That Will Transform How We Live, Work, and Think, by V. Mayer-Schönberger and K. Cukier
This book focuses on the importance of data analysis. You will learn more about how data analytics can be so crucial in different fields and how big data concepts shape itself as time goes by.
Too Big To Ignore: The Business Case for Big Data by P. Simon
Especially for those who are interested in big data in business, this book is essential. It is providing insights about how to turn data into intelligence and then how to use it.
Business unIntelligence: Insight and Innovation Beyond Analytics and Big Data, by B. Devlin
This book highlights some issues in the data analytics field. In this book besides figuring out the misconceptions, you will also learn about the history of data analytics and business intelligence; how they are today and how they will be in future.
Course & Training
Learning is a never ending process. Even if you believe you are fully trained for something; there is always room for new things. Data science is a multi-dimensional subject that requires investing in oneself. You can support your existing knowledge with the additional training materials. Let’s see what type of course or training a data analyst can get.
IBM Data Analyst Program: This Data Analyst Master’s program is provided by simplilearn.com in collaboration with IBM. You will learn about analytics tools, statistics, SQL databases, R and Python, and data visualizations.
Cloudera Data Analyst Training: This is a four-day data analyst course by Cloudera Educational Services. Participants will learn how to transform and analyze massive data sets in the Hadoop cluster using SQL and familiar scripting languages.
Udemy Data Science Course: The Data Science Course 2020: Complete Data Science Bootcamp is an ultimate guide for data science. You will learn mathematics, statistics, python, machine & deep learning.
Data Analyst in Python by Dataquest: This course focuses on probability, statistics, cleaning and visualizing data by teaching Python fundamentals. You can learn and sharpen your Python skills after this course.
Introduction to Data Analysis Using Excel: Provided by Coursera, this course mainly gives information about how to perform data analysis on Microsoft Excel. You will learn to read, organize, and manipulate data by using Excel functions.
Certified Analytics Professional
CAP is a general analytics certification that covers problem framing of business and analytics, deployment, methodology selection, model building and model lifecycle management. The certificate needs to be renewed every three year.
Price: $495 for INFORMS members, $695 for non-members
Requirements: CAP exam and adhering to the CAP Code of Ethics.
Certification of Professional Achievement in Data Sciences
This is a non-degree program that consists of four parts: Algorithms for Data Science, Probability & Statistics, Machine Learning for Data Science, and Exploratory Data Analysis and Visualization.
Organization: Columbia University
Price: $1,968 per credit, $85 application fee for the on-campus program, $150 for the online program and $395 technology fee per course.
Requirements: Undergraduate degree, prior quantitative and computer programming coursework.
Microsoft Certified Data Analyst Associate
This certification is one of several important certifications under the “Microsoft Certified Solutions Expert (MCSE): Data Management and Analytics”. The certificate provides its holders analytic capabilities, and the ability to manage data by building scalable models, cleaning and transforming the data.
Price: $165 per exam
Requirements: Free online training and a paid course.
Cloudera Certified Associate (CCA) Data Analyst
This certified program provides candidates the relevant core qualifications to learn how to use Impala and Hive to generate reports in Cloudera’s CDH environment.
Requirements: CCA Data Analyst Exam
EMC Proven Professional Data Scientist Associate (EMCDSA)
This certification is an opportunity for data analysts to contribute to big data analytics projects. Having this certificate, you can learn how to apply analytic techniques to big data and create statistical models.
Organization: Dell EMC Education Services
Price: $230 for exam voucher
Requirements: Data science and big data analytics exam, and experience with SQL, Java, Perl, Python, or R.
Open Certified Data Scientist
This is an independent global certificate that sharpens the skills that data analysts have. The certification program doesn’t require any written exam and doesn’t provide training courses.
Organization: The Open Group
Price: $1,250 for certification, $250 for recertification.
Requirements: Experience application form and a board review.
SAS Certified Advanced Analytics Professional Using SAS 9
This SAS certification, when earned, is a confirmation that the holder has the ability to analyze big data with statistical analyses, machine learning, optimization, and predictive modeling techniques. Therefore, candidates are expected to have experience in machine learning, optimization, time series forecasting, pattern detection, etc.
Price: $250 for one exam, $180 for other two (each)
Requirements: Three exams:
- Predictive Modeling Using SAS Enterprise Miner 7, 13, or 14
- SAS Advanced Predictive Modeling
- SAS Text Analytics, Time Series, Experimentation and Optimization
KPIs in Data Analytics
Determining data analytics KPIs is important because it helps companies to focus on the right information. Knowing what the key performance indicators are makes companies closer to their business goals.
The question is: how to set the success of big data initiatives? What are the metrics that measure success in data analysis? KPIs may differ from sector to sector or company to company; however some KPIs are common in almost every scenario.
- Customer Service: How does the big data contribute to customer service? There would be some changes in customer experience after data initiatives. Customer lifetime value, customer retention rate and net promoter score (NPS) are related metrics to see how big data affects customer service.
- Cost Efficiency: Big data is being talked about; which means big technological investments. The aim of some data initiatives might be to reduce costs in the long term; but they may also cause extra costs during the implementation processes. So cost-savings should be carefully measured considering company-related expenses like downtime, maintenance costs or indirect expenses.
- Operational Efficiency: It is important to measure the affection of big data initiatives on operations. Data analytics may disrupt the entire operational progress or enhance it. Capacity utilization, adoption of analytic products deployed into the business are involved in metrics that should be considered while evaluating operational efficiency.
- Employee Satisfaction: There is an undeniable relation between employee satisfaction and data initiatives. Above all, employees need to get trained on big data to get familiar with at least in general. Training means extra time and effort. Overtime hours, training and how employees get the capabilities to work with big data are metrics for KPI.
Python is a free open source tool and one of the most essential programming languages for data analysis. You can interpret large datasets, make visualizations and evaluate your data effectively with Python’s strong features.
Data analysis and visualization can easily be done via Tableau. It offers a really fast and powerful platform for data analysis with its features of data blending, real time analysis, collaboration of data.
Power BI is an analytics service and a data visualization program created by Microsoft. Just like the other programs, it helps you to manage data from beginning to end; analyzing, visualizing and getting insights.
Google Data Studio
Google Data Studio is a free tool by Google that helps you to turn your data into customizable dashboards and reports. There are charts, graphs, tables and even more options available to create dashboards. Not only do you visualize your data; but also you can connect to easily share insights with others. It is very simple to use.
R Programming / R Studio
R is one of the leading tools for analytics. It is super helpful if you are modeling data or working on statistics. R can run on various platforms as UNIX, Windows and MacOS.
Sas is statistical analysis software and a leading programming language. It is accessible, easy to use and very suitable to manipulate data.
Career is always a tough path to walk on. There are unexpected curves and challenges to face whatever profession you have. What we listed and examined above were the crucial points to build a career in data analysis. If you aim to become a data analyst, or you already are, you know what is expecting you ahead and how to get yourself prepared. You can get closer to the peak of your career by following the right steps, taking advantage of the informative courses and certified programs, learning more about the tools, and reading essential books. Use this guide as a preparation kit and start sharpening your data analyst skills.