pipeline

Introduction

What is Data Engineering?
Why is it needed?
Two types of data engineers (A & B)

Chapter 1 (Data Generation and Transfer)

Set up Environment
- python -m venv ./env
- .\env\Scripts\Activate.ps1
- python.exe -m pip install --upgrade pip
- python -m pip install -r .\requirements.txt
Turn on MySQL (3306), ClickHouse (8123), Airbyte (8000) (start generating data)
- docker compose -f .\00_infrastructure\DockerCompose.yaml up -d
- go into data generation folder cd .\01_data_generation
- run the python script to seed the data python -m data_gen_and_seed (then wait for 5 min)
- docker compose -f .\00_infrastructure\airbyte\DockerCompose_airbyte.yaml up -d
Show Relational Diagram
Show how the data is generated
- 01_data_generation/DataGenClasses.py
- 01_data_generation/data_gen_and_seed.py
Show data in source (connect to mysql using dbeaver)
Create connection from Mysql to ClickHouse through Airbyte
1. login to airbyte localhost:800 username: airbyte password: password
  - create database on clickhouse: CREATE DATABASE mysql_extracts;
  - make sure to land data in new schema called mysql_extracts in clickhouse
2. Talk About Enable binary logging on mysql
3. describe what airbyte is
4. describe what ClickHouse is

Chapter 2 (Organizing the Data)

create the dbt repo shown in 02_transformation
- dbt --version
- dbt init dbt_visits
- create profiles.yaml file
- create sources.yml file
- add group to dbt_project.yml file
- dbt debug
- create base and intermediate folders
dbt models
- create models
- show documentation
- (optional)
- patient attributes model (v1 & v2)
- visits joined (v1 & v2)
dbt tests
- what are they
- quick demo

Chapter 3 (Orchestration)

Create Dagster environment
- cd into dbt_visits
- dagster-dbt project scaffold --project-name my_dagster
- move my_dagster folder up one level mv .\my_dagster\ ../
Connect dbt to Dagster
- change dbt_project_dir = Path(__file__).joinpath("..", "..", "..").resolve() TO dbt_project_dir = Path('..','dbt_visits').resolve()
- turn on dagster
  - linux DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1 dagster dev
  - windows $env:DAGSTER_DBT_PARSE_PROJECT_ON_LOAD = "1"; dagster dev
- now lets add more models to dbt and watch dagster pull them in
Connect Airbyte to Dagster
- Create an airbyte resource (https://docs.dagster.io/concepts/resources & https://docs.dagster.io/integrations/airbyte#using-airbyte-with-dagster)
Set up report delivery schedules
(future task) Set up Alerting

Things not shown here

Working with people you have no control over to communicate the vision and convince to give you the permissions needed to execute
Working with Data Science to figure out what data is important to them
How to Secure your connections and data
How to deploy to production

Possible Changes/Improvements

Swap out ClickHouse for DuckDB
Improve dates in fake data
Set up notifications of failure

Link To Video Demo

Data Pipeline Implementation at Big Mountain Data and Dev Conference

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
00_infrastructure		00_infrastructure
01_data_generation		01_data_generation
02_transformation/dbt_visits		02_transformation/dbt_visits
03_orchestration/visits_dagster		03_orchestration/visits_dagster
scripts.d		scripts.d
.gitignore		.gitignore
README.md		README.md
Table Structure.drawio		Table Structure.drawio
descriptions.md		descriptions.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pipeline

Introduction

Chapter 1 (Data Generation and Transfer)

Chapter 2 (Organizing the Data)

Chapter 3 (Orchestration)

Things not shown here

Possible Changes/Improvements

Link To Video Demo

About

Releases

Packages

Contributors 2

Languages

marckeelingiv/pipeline

Folders and files

Latest commit

History

Repository files navigation

pipeline

Introduction

Chapter 1 (Data Generation and Transfer)

Chapter 2 (Organizing the Data)

Chapter 3 (Orchestration)

Things not shown here

Possible Changes/Improvements

Link To Video Demo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages