Hello Everyone,
Here is My EDA Project on IRIS Dataset where I analyzed the Data by using Seaborn and Matplotlib.
-
This Dataset consists of 3 different types of Iris Flower (Setosa, Versicolour, and Virginica).
-
The information includes its Petal Lenght, Sepal Length, Petal Width, Sepal Width, Variety of the Flower
Link to the Dataset : Iris Dataset
-
The objective of this Project is to perform Exploratory Data Analysis (EDA) on the Iris dataset.
-
The Iris dataset is a popular and well-known dataset in the field of machine learning and statistics.
-
It consists of measurements of four features sepal length, sepal width, petal length, and petal width of three different species of iris flowers: setosa, versicolor, and virginica.
-
The goal of this EDA is to gain insights into the dataset, understand the relationships between the features, and extract meaningful information that can aid in further analysis or modeling tasks.
- Setting up the Enviroment
- Libraries required for the Project
- Getting started with Repository
- Steps involved in the Project
- Conclusion
- Link to the Notebook
Jupyter Notebook is required for this project and you can install and set it up in the terminal.
-
Install the Notebook -
pip install notebook
-
Run the Notebook -
jupyter notebook
Pandas
-
Go to Terminal and run this code -
pip install pandas
-
Go to Jupyter Notebook and run this code from a cell -
!pip install pandas
Matplotlib
-
Go to Terminal and run this code -
pip install matplotlib
-
Go to Jupyter Notebook and run this code from a cell -
!pip install matplotlib
Seaborn
-
Go to Terminal and run this code -
pip install seaborn
-
Go to Jupyter Notebook and run this code from a cell -
!pip install seaborn
- Clone the repository to your local machine using the following command :
git clone https://github.com/TheMrityunjayPathak/ExploratoryDataAnalysis.git
Importing libraries required for Project
Reading CSV File
Exploring the Dataset
Checking Null Values in Dataset
Splitting the Dataset based on Species of Flower
Data Visualization
Count Plot on IRIS Dataset
- A Count Plot is similar to a Bar Plot and a Histogram and provides counts of categorical values.
Scatter Plot on IRIS Dataset
- Scatter plots are the graphs that present the relationship between two variables in a data-set.
Pair Plot on IRIS Dataset
- To Plot Multiple Pairwise Bivariate Distributions in a Dataset, you can use the sns.pairplot() function.
Distribution of Sepal Length and Petal Length of Different Species of Flower
-
Distribution plots show how a variable (or multiple variables) is distributed.
-
To Plot distribution plots you can use the sns.displot() function.
Box Plot on IRIS Dataset
-
A box and whisker plot also called a box plot displays the five-number summary of a set of data.
-
The five-number summary is the minimum, first quartile, median, third quartile, and maximum.
Heat Map on IRIS Dataset
-
Heat Map is used to find out the Correlation between different features in the Dataset.
-
High Positive or Negative value shows that the features have high Correlation.
-
A Correlation Heatmap is a graphical tool that displays the correlation between multiple variables as a color-coded matrix.
-
In conclusion, the exploratory data analysis (EDA) conducted on the Iris dataset using Seaborn and Matplotlib has provided valuable insights into the dataset's characteristics and relationships between variables.
-
The EDA revealed that the Iris dataset consists of 150 samples, each representing a different Iris flower, with four features: sepal length, sepal width, petal length, and petal width.
-
The dataset is balanced, with 50 samples for each of the three Iris species: Setosa, Versicolor, and Virginica.
-
Using Seaborn and Matplotlib, we created various plots to explore the dataset.
Scroll to Top ⬆️ |
---|