[Pycon] [new paper] "Christian Barra" - Scaling up your data infrastructure

Lun 6 Nov 2017 11:52:07 CET

Title: Scaling up your data infrastructure
Duration: 45 (includes Q&A)
Q&A Session: 15
Language: en
Type: Talk

Abstract: **This talk aims to answer a few questions:**

 - What do you do when you need to move your model from your laptop to production?
 - Is big data == I need to use JVM?
 - What do you do when you need to have GPUs to train your model?
 - How do you apply the best software engineering practises (testing and ci for example) inside your data science process?
 - How do you “decouple” your data scientists, developers and devops teams?
 - How do you guarantee the reproducibility of your models?
 - How do you scale your training process when does not fit in memory anymore?
 - How do you serve your models and provide an easy rollback system?

I’ll share my experience highlighting some of the challenges I faced and the solutions I came up to answer these questions.

The principles and best practises I will share are something that you can apply, more or less easily, if you are running or in the process to run a production system based on the Python stack.

This talk will focus on (my) best practices to run the Python Data stack together and is the result of more than 1 year of working on a project called Cassiny, that aims to simplify your life if you want to use a completely Python based solution in your data science workflow.

Tags: [u'java', u'Data-Scientist', u'docker', u'pydata']