# Exploratory Data Analysis

Exploratory Data Analysis (EDA) is an approach for data analysis that employs a variety of techniques (mostly graphical) to

1. uncover underlying structure;
2. extract important variables;
3. detect outliers and anomalies;
4. test underlying assumptions;
5. develop parsimonious models; and
6. determine optimal factor settings.

# Types of exploratory data analysis

There are four primary types of EDA:

• Univariate graphical: Non-graphical methods don’t provide a full picture of the data. Graphical methods are therefore required. Common types of univariate graphics include:
1. Histograms, a bar plot in which each bar represents the frequency (count) or proportion (count/total count) of cases for a range of values.
2. Box plots, which graphically depict the five-number summary of minimum, first quartile, median, third quartile, and maximum.
• Multivariate graphical: Multivariate data uses graphics to display relationships between two or more sets of data. The most used graphic is a grouped bar plot or bar chart with each group representing one level of one of the variables and each bar within a group representing the levels of the other variable.
• Multivariate chart, which is a graphical representation of the relationships between factors and response.
• Run chart, which is a line graph of data plotted over time.
• Bubble chart, which is a data visualization that displays multiple circles (bubbles) in a two-dimensional plot.
• Heat map, which is a graphical representation of data where values are depicted by color.

# EDA explained using stroke-prediction-dataset Data set from Kaggle:

Let’s learn and consider an example dataset to learn practicality. I have taken stroke-prediction-dataset Data which is available on Kaggle. This dataset is used to predict whether a patient is likely to get a stroke based on the input parameters like gender, age, various diseases, and smoking status. Each row in the data provides relevant information about the patient.

1. Line Graphs
2. Pie Graphs
3. Correlation Matrix
4. Pair Plot
5. Scatter Plot
6. Box Plots
7. Multivariate Analysis:

Machine Learning Engineer

## More from Sharat Kedari

Machine Learning Engineer

## What is Data Science in Simple Words? A Beginner’s Guide

Get the Medium app