[Pycon] [new paper] "Kajal Puri" - Build text classification models ( CBOW and Skip-gram) with FastText in python

Dom 6 Gen 2019 23:53:01 CET

Title: Build text classification models ( CBOW and Skip-gram) with FastText in python
Duration: 45 (includes Q&A)
Q&A Session: 15
Language: en
Type: Talk

Abstract: NLP is an exciting way to interpret the textual data especially when we know that computer neither speak nor understand any kind of human language. So, how do we represent each word of a language in such a unique numerical pattern and process it in quickest way possible. Answer is FastText library.

FastText has been open-sourced by Facebook in 2016 and with its release, it became the fastest and most accurate library in Python for text classification and word representation. It is to be seen as a substitute for gensim package’s word2vec. It includes the implementation of two extremely important methodologies in NLP i.e Continuous Bag of Words and Skip-gram model. Fasttext performs exceptionally well with supervised as well as unsupervised learning.

The tutorial will be divided in following four segments :

0-10 minutes: The talk will begin with explaining the difference between word embeddings generated by word2vec, Glove, Fasttext and how FastText beats all the other libraries with better accuracy and in lesser time.

10-30 minutes: The code will be shown and explained line by line for both the models (CBOW and Skip-gram) on a standard textual labeled dataset with the tips on hyper-parametric tuning to get the best possible results.

30-50 minutes: How to use the pre-trained word embeddings released by FastText on various languages and where to use them. Various use cases of what kind of problems can be solved using FastText in python.

50-60 minutes: For QA session.

Tags: [u'Data Mining', u'analytics', u'Full Text Search', u'data-visualization', u'Machine Learning']