[Pycon] [new paper] "Giuseppe Di Bernardo" - GPU-accelerated data analysis in Python: a study case in Material Sciences

info a pycon.it info a pycon.it
Sab 6 Gen 2018 20:12:44 CET


Title: GPU-accelerated data analysis in Python: a study case in Material Sciences
Duration: 60 (includes Q&A)
Q&A Session: 15
Language: en
Type: Talk

Abstract: The Max Planck Computing and Data Facility is engaged in the development and optimization of algorithms and applications for high performance computing as well as for data-intensive projects. As programming language in data science, Python is now used at MPCDF in the scientific area of “atom probe crystallography” (APT): a Fourier analysis in 3D space can be simulated in order to reveal composition and crystallographic structure at the atomic scale of billions APT experimental data sets. The Python data ecosystem has proved to be well suited to this, as it has grown beyond the confines of single machines to embrace scalability. The talk aims to describe our approach to scaling across multiple GPUs, and the role of visualization methods too. Our data workflow analysis relies on the GPU-accelerated Python software package PyNX, an open source library which provides fast parallel computation scattering. The code takes advantage of the high throughput of GPUs, using the pyCUDA library. Exploratory data analysis, high productivity and rapid prototyping with high performance are enabled through Jupyter Notebooks and Python packages e.g., pandas, matplotlib/plotly. In production stage, interactive visualization is realized by using standard scientific tool, e.g. Paraview, an open-source 3D visualization program which requires Python modules to generate visualization components within VTK files.

Tags: [u'visualization', u'analytics', u'Data Mining', u'GPUComputing', u'image-processing', u'data-analysis', u'python3', u'mathematical-modelling', u'physics', u'data-visualization', u'matplotlib', u'bigdata', u'scientific-computing']


Maggiori informazioni sulla lista Pycon