[Pycon] [new paper] "Manoj Pandey" - Building a Data Science pipeline
info a pycon.it
info a pycon.it
Dom 7 Gen 2018 14:21:50 CET
Title: Building a Data Science pipeline
Duration: 240 (includes Q&A)
Q&A Session: 0
Language: en
Type: Training
Abstract: This **workshop** will aim to teach elements of an entire **data science pipeline**, by taking an example of some cool datasets and then helping the audience build a small pipeline, making some hypothesis & validating it finally to eventually build some algorithm / machine learning model / etc, and at the end will culminate in how to share the findings to other users / team members etc.
The **take-away** of the workshop is to make the audience aware that a data science pipeline is not just about building your algorithm / machine learning model, but there are many steps involved in the pipeline, which are equally important.
`The entire data science pipeline will consist of these steps:`
1. Essential Data Science tool-kit:
Learning about the important tools like - Git, ipython, jupyter, scipy ecosystem
2. Data Collection and Storage:
Ways to scrape the data, handling different file formats, storing data to flat files, to databases etc
3. Cleaning and Wrangling the data:
Leveraging libraries like pandas, json to wrangle the data
4. Exploratory Data Analysis:
Using tools like pandas, matplotlib, seaborn etc to perform EDA
5 .Building hypothesis and collecting validations
6. Reproducibility:
Discuss about scientific reproducibility
7. Sharing the findings - Visualizations / papers etc:
Building visualizations for the web - using D3.JS, charts, graphs etc.
Tags: [u'scikit-learn', u'collection', u'numpy', u'bokeh', u'selenium', u'd3', u'matplotlib', u'lxml', u'pandas']
Maggiori informazioni sulla lista
Pycon