Skip to content

This is the project done in collaboration with my colleagues Roberta Pappolla and Lorenzo Ferri. The scope of the project was a simulation of a machine learning/data science project on a big dataset. Thus, a cluster computing framework was used: Hadoop/Apache Spark.

Notifications You must be signed in to change notification settings

emailic/Hadoop-ApacheSpark-Clustering-Classification-Dimensionality-Reduction

Repository files navigation

Hadoop-ApacheSpark-Analysis

This is the project done in collaboration with my colleagues Roberta Pappolla and Lorenzo Ferri. The scope of the project was a simulation of a machine learning/data science project on a big dataset. Thus, a cluster computing framework was used: Hadoop/Apache Spark. Various ML techniques were deployed: Classification, Clustering, Regression, DImensionality Reduction, Feature Engineering, etc.

NOTE: Most notebooks have the comments added in Italian language, sorry for that! I'm available to clarify anything, just get in touch.


Contrbutions are more than welcome!

Show some 💚 by starring this repository!

About

This is the project done in collaboration with my colleagues Roberta Pappolla and Lorenzo Ferri. The scope of the project was a simulation of a machine learning/data science project on a big dataset. Thus, a cluster computing framework was used: Hadoop/Apache Spark.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published