[Pycon] [new paper] "Luca Pappalardo" - Sports performance evaluation: from cognitive mechanisms to data-driven algorithms

Dom 7 Gen 2018 12:57:06 CET

Title: Sports performance evaluation: from cognitive mechanisms to data-driven algorithms
Duration: 60 (includes Q&A)
Q&A Session: 15
Language: en
Type: Talk

Abstract: > Not everything that counts can be counted, and not everything that can be counted counts. 
A. Einstein

Humans are routinely asked to evaluate the performance of other individuals, separating success from failure and affecting outcomes from science to education and sports. Yet, little is known about the aspects that determine the human perception of performance. How do expert reviewers, as well as ordinary people, arrive to their evaluations? To what extent these evaluations are based on objective performance features? How are they affected by subjective biases or contextual influences?

This talk will answer these fascinating questions focusing on _soccer_, the most popular sport in the world. Firstly, we will show how machine learning can accurately reproduce the mechanisms human judges use to evaluate the performance of soccer players, uncovering limits and characteristics of the human evaluation process. Second, we design  a Python package that allows, in  a completely unsupervised and data-driven way, to (i) evaluate the quality of a player’s performance and (ii) rank soccer players based on their performances.

The first part of the talk will show how soccer ratings assigned to every player of a game by sport-specialized newspapers are associated with a high-dimensional vector of features extracted by massive data which describe any quantifiable aspects of soccer games. The talk will show how, by using Scikit-learn, we can train an _artificial judge_ which learns the relation between technical performance and soccer ratings, hence _accurately reproducing_ the human evaluation process. By inspecting the structure of the artificial judge, the talk will show that the human evaluation criteria follow a simplistic cognitive process based on a simple heuristic: judges first select a limited number of features which attract their attention and then rate a performance based on the presence of noticeable values, i.e., features values far from the norm that can be easily brought to mind.

The second part of the talk will show how to overcome the simplicity of the human evaluation process presenting **PlayeRank**, a Python package which implements an unsupervised data-driven framework to evaluate the performance of soccer players in the main European leagues. The talk will show how to use PlayeRank to construct a data-driven ranking of players and highlight the factors which determine why celebrated players, like Messi and Cristiano Ronaldo, actually result to be the top players in the world. The modules composing PlayeRank will be presented, showing how they allow the user to define the features characterizing a performance, to detect in an automatic way the relevance of each player’s action to a game outcome, to detect the role of a player given his game data, to rate every performance as well as to obtain a final ranking of all players in Europe. A short demo will be provided during the talk through a Jupyter notebook, exploiting interactive data visualization with the Bokeh package.

The audience will learn how to use Python to construct evaluation algorithms entirely based on machine learning and big data, a step forward to a thorough and objective evaluation of performance which overcomes the biases and the limitations of human perception of performance. Just a basic knowledge of Python and of data mining principles is required for a full understanding of the talk.

Tags: [u'Data Mining', u'Machine Learning', u'Python', u'sports-analytics', u'mathematical-modelling', u'bigdata', u'Algorithms', u'analytics', u'sklearn']