[Pycon] [new paper] "Massimo Nicosia" - Understanding Google SyntaxNet to build your own word taggers

info a pycon.it info a pycon.it
Dom 7 Gen 2018 21:49:42 CET


Title: Understanding Google SyntaxNet to build your own word taggers
Duration: 45 (includes Q&A)
Q&A Session: 15
Language: en
Type: Talk

Abstract: After this talk, you will understand the SyntaxNet framework and be ready to create word taggers (or even sentence classifiers) for your problems, building on the shoulders of a battle-tested system used every day by Google to parse billions of web pages.

Word tagging is one of the main tasks in Natural Language Processing. A tagger can be used to determine the syntactic function of words, extract named entities from news and healthcare data, find entity mentions in web pages and customer reviews, just to name a few useful tasks. Recently, Google released SyntaxNet, a state-of-the-art neural syntactic tagger and parser. This transition-based system is production-ready and used by Google to parse the Web.

During the talk, we will briefly introduce SyntaxNet and transition-based systems. After that, we will focus on tagging, and we will show how the SyntaxNet neural network graph can be isolated from the framework and used to train our models. We will describe how the Google system is structured, and the deep learning best practices used to implement the neural graph.  

The main obstacle in adopting the SyntaxNet graph is to obtain the input representations for our models. In order to transform a sentence into a suitable input we will:
1. describe the concepts underlying the SyntaxNet feature extraction process;
2. suggest a pure Python feature extraction API which mimicks the behaviour of the C++ SyntaxNet components;
3. highlight the other few adaptations required to feed an input to the SyntaxNet neural graph.

We will also show that the word-representation centric feature extraction component is very flexible, and can be easily adapted to sentence classification.

With the knowledge of SyntaxNet internals and our sample implementation, you will be ready to develop and train fast, robust and accurate taggers.


Tags: [u'google-syntaxnet', u'neural network', u'sequence-tagging', u'nlp', u'Deep-Learning']


Maggiori informazioni sulla lista Pycon