Skip to content

TheMrityunjayPathak/ExploratoryDataAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis

Hello Everyone,

Here is My EDA Project on IRIS Dataset where I analyzed the Data by using Seaborn and Matplotlib.

Dataset

  • This Dataset consists of 3 different types of Iris Flower (Setosa, Versicolour, and Virginica).

  • The information includes its Petal Lenght, Sepal Length, Petal Width, Sepal Width, Variety of the Flower

Link to the Dataset : Iris Dataset

Problem Statement

  • The objective of this Project is to perform Exploratory Data Analysis (EDA) on the Iris dataset.

  • The Iris dataset is a popular and well-known dataset in the field of machine learning and statistics.

  • It consists of measurements of four features sepal length, sepal width, petal length, and petal width of three different species of iris flowers: setosa, versicolor, and virginica.

  • The goal of this EDA is to gain insights into the dataset, understand the relationships between the features, and extract meaningful information that can aid in further analysis or modeling tasks.

Table of Contents

Setting up the Enviroment

Jupyter Notebook is required for this project and you can install and set it up in the terminal.

  • Install the Notebook - pip install notebook

  • Run the Notebook - jupyter notebook

Libraries required for the Project

Pandas

  • Go to Terminal and run this code - pip install pandas

  • Go to Jupyter Notebook and run this code from a cell - !pip install pandas

Matplotlib

  • Go to Terminal and run this code - pip install matplotlib

  • Go to Jupyter Notebook and run this code from a cell - !pip install matplotlib

Seaborn

  • Go to Terminal and run this code - pip install seaborn

  • Go to Jupyter Notebook and run this code from a cell - !pip install seaborn

Getting Started

  • Clone the repository to your local machine using the following command :
git clone https://github.com/TheMrityunjayPathak/ExploratoryDataAnalysis.git

Steps involved in the Project

Importing libraries required for Project

Reading CSV File

Exploring the Dataset

Checking Null Values in Dataset

Splitting the Dataset based on Species of Flower

Data Visualization

Count Plot on IRIS Dataset

  • A Count Plot is similar to a Bar Plot and a Histogram and provides counts of categorical values.

download

Scatter Plot on IRIS Dataset

  • Scatter plots are the graphs that present the relationship between two variables in a data-set.

download

download

Pair Plot on IRIS Dataset

  • To Plot Multiple Pairwise Bivariate Distributions in a Dataset, you can use the sns.pairplot() function.

download

Distribution of Sepal Length and Petal Length of Different Species of Flower

  • Distribution plots show how a variable (or multiple variables) is distributed.

  • To Plot distribution plots you can use the sns.displot() function.

download

download

Box Plot on IRIS Dataset

  • A box and whisker plot also called a box plot displays the five-number summary of a set of data.

  • The five-number summary is the minimum, first quartile, median, third quartile, and maximum.

download

download

Heat Map on IRIS Dataset

  • Heat Map is used to find out the Correlation between different features in the Dataset.

  • High Positive or Negative value shows that the features have high Correlation.

  • A Correlation Heatmap is a graphical tool that displays the correlation between multiple variables as a color-coded matrix.

download

Conclusion

  • In conclusion, the exploratory data analysis (EDA) conducted on the Iris dataset using Seaborn and Matplotlib has provided valuable insights into the dataset's characteristics and relationships between variables.

  • The EDA revealed that the Iris dataset consists of 150 samples, each representing a different Iris flower, with four features: sepal length, sepal width, petal length, and petal width.

  • The dataset is balanced, with 50 samples for each of the three Iris species: Setosa, Versicolor, and Virginica.

  • Using Seaborn and Matplotlib, we created various plots to explore the dataset.

Link to the Notebook

Exploratory Data Analysis

Scroll to Top ⬆️