emr-cluster

Here are 95 public repositories matching this topic...

BiGHeaDMaX / Extraction-features-avec-Spark

Ce projet a pour but de réaliser une extraction de features, suivie d'une PCA sur des données volumineuses à l'aide de Spark dans le cloud.

aws spark hadoop bigdata aws-emr pyspark pca transfer-learning emr-cluster features-extraction bigdatacloud elastic-mapreduce

Updated Mar 14, 2024
Jupyter Notebook

abie-ramie / HQL_Cosmetic_ClickStreamData

Star

With online sales gaining popularity, tech companies are exploring ways to improve their sales by analyzing customer behavior and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging.

aws hive hiveql emr-cluster

Updated Jul 7, 2021

rkr2017 / emr-slack-notify

Star

AWS Lambda function to send EMR events to Slack via SNS

emr slack aws aws-lambda lambda-functions cloudwatch-events emr-cluster

Updated Nov 30, 2017
JavaScript

omarfessi / UDACITY-CapstoneProject

Star

It's just my first repo, feel free to give feedbacks 😁

sql spark aws-s3 python3 iac aws-ec2 redshift emr-cluster

Updated Jan 19, 2021
Jupyter Notebook

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

Star

A robust data pipeline leveraging Amazon EMR and PySpark, orchestrated seamlessly with Apache Airflow for efficient batch processing

distributed-computing snowflake pyspark amazon-s3 emr-cluster airflow-dags transient-cluster

Updated Jan 1, 2024
Python

danielhaviv / emr_storage_autoscaler

Star

emr spark yarn emr-cluster

Updated Mar 22, 2018
Shell

ustcdj / Sparkify_Churn_Analysis

Star

Preventing churn is key to improving revenue for Sparkify, a subscription-based company (fictitious). This project is to analyze data from Sparkify to build a model to predict user churn. First, a sample dataset (128MB) was used on a local machine to explore relevant features and develop a working model. Then similar steps were used to develop a…

aws spark machine-learning-algorithms music-streaming emr-cluster

Updated Sep 10, 2020
Jupyter Notebook

skyler-myers-db / Common-Crawl-Analysis

Star

Parsing the common crawl database using Scala and Spark

emr scala big-data spark s3 s3-bucket common-crawl emr-cluster

Updated Mar 17, 2021
Scala

JennaFar / elastic-data-factory

Star

Elastic Data Factory

aws data-science machine-learning sql presto deployment athena data-acquisition data-visualization pyspark data-processing emr-cluster sagemaker sagemaker-deployment

Updated Oct 26, 2023
Python

adnanrahin / spark-rdd-df-comparison-emr

Star

java aws scala spark dataframe rdd emr-cluster

Updated Dec 23, 2023
Scala

ramtekeabhas7 / Hive_Case_Study_using_AWS_Hadoop

Star

The goal is to extract the data and gather insights from a real-life data set of an e-commerce company, using BIG Data tools like Hive, Hadoop, AWS etc.

aws hive hadoop ec2 s3-bucket emr-cluster

Updated Dec 19, 2022

cloudposse-archives / terraform-aws-spotinst-mrscaler

Star

Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS using a Spotinst AWS MrScaler resource

emr cluster map-reduce spot-instances spotinst emr-cluster hcl2

Updated May 21, 2024

mwilchek / Hadoop-Testing

Star

Repo for playing around an AWS Elastic Map Reduce (EMR) cluster

hive pig hadoop-mapreduce emr-cluster

Updated Mar 18, 2019
PigLatin

rupeshtr78 / aws-emr

Star

Spark Job on Amazon EMR cluster

aws scala spark cluster mapreduce emr-cluster mapredue

Updated Feb 29, 2020

Tanay0510 / Data-Lake-with-Spark

Star

Load data from S3, process the data into analytics tables using Spark and load them back into S3. Deployed this Spark process on a cluster using AWS EMR

spark s3 datalake emr-cluster etl-pipeline

Updated Aug 17, 2021
Python

LFattorini / capstone-project-churn-prediction-udacity

Star

In this project, we attempt to predict customer churn of a popular (not real) music service. We perform data analysis and machine learning model building on a large amount of data using Spark.

aws data-science machine-learning pyspark webapp churn-prediction emr-cluster ml-pipeline

Updated Nov 11, 2021
Jupyter Notebook

sunnykan / sparkify-lake

Star

Creates a data lake by moving data held in an AWS S3 bucket to another S3 bucket after transforming it into tables based on a star schema.

python spark databases pyspark s3-storage datalake spark-sql emr-cluster ec2-instances

Updated Oct 30, 2021
Jupyter Notebook

jpsalado92 / Udacity-DEND_DataLake-AWSEMR

Star

Full code for UDACITY's Data Engineer Nano Degree project. Implementing a Data Lake in Amazon's cloud with AWS S3, AWS EMR and Spark.

s3-bucket data-warehouse aws-emr data-lake emr-cluster

Updated Jul 22, 2020
Python

sayaliwalke30 / BigDataAnalysis-RecommenderForAmazon

Star

Built a recommender system using Apache Mahout machine learning library carried out data analysis using Hadoop, Apache Hive & Pig on Amazon Customer Reviews Data set(130M+ reviews))

emr hadoop algorithms amazon map-reduce dataset mahout data-analysis big hadoop-mapreduce amazon-s3 emr-cluster amazonreviews

Updated Apr 13, 2020

Morgan-Sell / usa-tourism-etl

Star

Coalesced and transformed various data sources to create a comprehensive data lake for the USA tourism sector.

python aws spark data-engineering data-lake emr-cluster etl-pipeline

Updated May 3, 2021
Jupyter Notebook

Improve this page

Add a description, image, and links to the emr-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the emr-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emr-cluster

Here are 95 public repositories matching this topic...

BiGHeaDMaX / Extraction-features-avec-Spark

abie-ramie / HQL_Cosmetic_ClickStreamData

rkr2017 / emr-slack-notify

omarfessi / UDACITY-CapstoneProject

jashshah-dev / AWS-Big-Data-Pipeline-orchestrated-with-Airflow

danielhaviv / emr_storage_autoscaler

ustcdj / Sparkify_Churn_Analysis

skyler-myers-db / Common-Crawl-Analysis

JennaFar / elastic-data-factory

adnanrahin / spark-rdd-df-comparison-emr

ramtekeabhas7 / Hive_Case_Study_using_AWS_Hadoop

cloudposse-archives / terraform-aws-spotinst-mrscaler

mwilchek / Hadoop-Testing

rupeshtr78 / aws-emr

Tanay0510 / Data-Lake-with-Spark

LFattorini / capstone-project-churn-prediction-udacity

sunnykan / sparkify-lake

jpsalado92 / Udacity-DEND_DataLake-AWSEMR

sayaliwalke30 / BigDataAnalysis-RecommenderForAmazon

Morgan-Sell / usa-tourism-etl

Improve this page

Add this topic to your repo