[Pycon] [new paper] "Michele De Simoni" - DATA PLUMBING 101: Building Data Pipelines for fun and profit

info a pycon.it info a pycon.it
Sab 6 Gen 2018 15:07:23 CET


Title: DATA PLUMBING 101: Building Data Pipelines for fun and profit
Duration: 240 (includes Q&A)
Q&A Session: 0
Language: en
Type: Talk

Abstract: DATA PLUMBING 101: Building Data Pipelines for fun and profit
-------------------------------------------------------------

**Prerequisites:**

 - Basic Python Knowledge
 - Familiarity with Big Data terminologies
 - Familiarity with Cloud computing (ideally Google Cloud but AWS and Azure works fine too)
 - Familiarity with Docker (running a container)

**Learning Objectives:**

 - By the end of the tutorial attendants will ideally possess a th understanding of what a Data Pipeline is and will have all the basic notions required to start building one in Airflow

**Technical Requirements:**

- Systems with access to a Bash-like shell are highly recommended but not necessary

**Abstract**
Data has become the hottest commodity money can buy and as any other raw substance it needs to be extracted, refined and cleaned. Here steps in the Data Plumber, the ultimate master of the Data Pipeline.
In my talk I will explore how to build and maintain a working Data Pipeline with Apache Airflow. After a tour of its main features I will guide the audience through some practical examples in order to better understand the flexibility and power offered by the library. Complex use case analysis will revolve around using Airflow for maintaining a Data Pipeline on Google Cloud by leveraging the myriads of Connectors already available.

**Content/Agenda of the Tutorials (may be subject to slight variations):**

**Introduction: [20 Minutes]**

- Presenting the speaker    
- Who is a Data Plumber
- What is a Data Pipeline
- Break

**Airflow: [40 minutes]**

- Apache Airflow
- Airflow vs other pipelines (Luigi, Oozie)
- Dags
- Break
- Web UI & Advanced Metrics
- Scheduling and Dependencies management
- Break

**Setup & First Steps: [60 minutes]**

- Installation - Live Coding
- Configuration - Live Coding
- Writing Dags
- Simple Examples - Live Coding
- Break

**Complex Examples [120 minutes]**

- Exploring Connectors
- Building a complex Data Pipeline

Tags: [u'datawarehousing', u'etl', u'pipeline', u'google-cloud', u'bigdata', u'cloud', u'DataPipelines', u'Airflow', u'DataPlumbing', u'Big-Data', u'BigQuery', u'DataEngineering', u'ware', u'pydata']


Maggiori informazioni sulla lista Pycon