Skip to content

Prediction of Electricity consumption in Household Units by using Random Forest Regressor

Notifications You must be signed in to change notification settings

shreeyajoshi2013/Prediction-of-Electricity-Consumption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Prediction of Electricity Consumption

Goal of this project is building a data model that predicts electricity consumption, located in the KWH field in the dataset. This dataset contains information of energy costs and usage for heating, cooling, appliances and other end uses, from a sample of housing units.
The dataset taken from link.
(Number of Rows: approx. 12000,
Number of Columns: approx. 940)

Built with

  • Google Colab

Highlights

  • Random Forest Regressor

Libraries used

  • Pandas
  • Numpy
  • Matplotlib
  • Seaborn
  • Scikit-learn

What is being done?

  1. Data understanding
    • Data exploration
  2. Data preparation
    • One-Hot Encoding the categorical columns
    • Handling NaN values
    • Removing the unneacesary columns
    • Assumptions and considerations:
    • Columns starting with 'Z' are the imputation flags for other variables. So are to be removed as they will not contribute in the prediction.
    • Columns with thermal unit other than KWH are assumend to be not helpful. Hence are removed.
    • Columns which show the total consumptions of elements' electricity usage are redundant as the individual contributions by those elements are already present in the data. Hence are removed for avoiding data redundancy.
  3. Data Analysis
    • Finding the correlation of features with output variable and visualizing
  4. Random Forest Regressor
    • Using GridSearchCV for selecting optimal hyperparameters for the model
    • Choosing important features by calculating feature importances

Conclusion

There are about 14 features from the entire dataset that are found to be contributing the most towards the consumption of electricity, and are found after several steps of data cleaning, processing and feature engineering.
Random Forest Regressor is giving fair output for prediction of the consumption in Kilo Watt Hour (KWH ) with R2 score of 0.875. With more data exploration and manipulation, more optimised prediction can be obtained.

Further Tasks

Other models such as Neural Networks can be used for the prediction.
The features can be dugged deep with more EDA and by using libraries such as FeatureSelector to further improve the model and working more on feature importance.

References

About

Prediction of Electricity consumption in Household Units by using Random Forest Regressor

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published