Skip to content
#

data-engineering-pipeline

Here are 127 public repositories matching this topic...

πŸ’œπŸŒˆπŸ“Š A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Metabase, Dbt, Polars, Docker. Data from kaggle and youtube-api 🌺

  • Updated Jun 18, 2024
  • Jupyter Notebook

Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

  • Updated May 30, 2024
  • Python

Data Engineering πŸ› οΈ is like the backbone of data processing πŸ“Š, managing data pipelines πŸš€, warehouses 🏒, and lakes 🌊. It's the bridge πŸŒ‰ between raw data and actionable insights, powering businesses πŸš€ with efficient data management and analytics πŸ“ˆ.

  • Updated Mar 26, 2024
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the data-engineering-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more