[Pycon] [new paper] "Aileen Nielsen" - Algorithmic fairness in data discovery, processing, and prediction: a full tutorial

Dom 7 Gen 2018 18:26:40 CET

Title: Algorithmic fairness in data discovery, processing, and prediction: a full tutorial
Duration: 240 (includes Q&A)
Q&A Session: 0
Language: en
Type: Talk

Abstract: There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government applications is likely reproducing or even amplifying existing prejudices and social inequalities. Even when an organization and a software engineer seeks to maintain fairness and accuracy, it is easy to unintentionally create software that exhibits discriminatory or privacy-violating behavior. This tutorial is designed to give a background to both software engineers and data scientists sufficient to identify potential problems, avoid those problems, and apply good practices to developing new software and machine learning products.

Intended Audience: Intermediate Python users who are interested in or practitioners of data analysis. It is helpful but not necessary that attendees have at least a familiarity with popular families of machine learning models, including linear regression, decision trees, and neural networks. Familiarity with building a basic ETL pipeline and gathering data from open source repositories is also helpful but not necessary.
Outline & Timing:

1.     Introduction + social relevance (15 minutes)
a.     Relevant news stories – 5 minutes
b.     Brief introduction to relevant legal concepts + their applicability to data analysis and model building – 10 minutes

2.     Data discovery – 45 minutes
a.     Examples of how ‘bad’ or incomplete data sets can lead to discriminatory models – 5 minutes
b.     How to examine your input data and balance your input data before inputting into an analysis pipeline – 40 minutes

3.     Data processing – 45 minutes
a.     Examples of how data processing has resulted in discriminatory models – 5 minutes
b.     How to examine your preprocessing pipeline to prevent discriminatory inputs – 15 minutes
c.     Examples of how data processing has resulted in privacy-violating models – 5 minutes
d.     How to examine your process for privacy leaks – 20 minutes

4.     Modeling – 40 minutes
a.     Examples of how choice of model can lead to discriminatory results – 5 minutes
b.     Examples of how models can be designed to be more or less vulnerable to discriminatory input data – 15 minutes
c.     How to test your model & examine final parameters/fits for discriminatory behavior for a variety of common model families – 20 minutes

5.     Auditing your model – 15 minutes
a.     Examples of how even models following processes above may still yield discriminatory behavior – 3 minutes
b.     Auditing your model as a blackbox with existing Python language solutions – 12 minutes

6.     Research frontiers – 10 minutes
a.     Updates on how computer scientists and sociologists are developing new methods to avoid discriminatory and privacy-violating models. Several newly published papers will be presented to give audience a sense of the breadth and current state of this active area of research

Tags: [u'social-impact', u'data-science']