[Pycon] [new paper] "Karishma Babbar" - Do more with Twitter data: Understanding the people behind the Tweet

Sab 19 Gen 2019 21:01:25 CET

Title: Do more with Twitter data: Understanding the people behind the Tweet 
Duration: 60 (includes Q&A)
Q&A Session: 15
Language: en
Type: Talk

Abstract: The Twitter, a large social networking microblogging website, less than three years ago, has evolved to become one of the most appropriate virtual environments for information monitoring and tracking. It includes vast amounts of information about almost all industries, ranging from entertainment to sports, politics to business, health to major events around the world. One of the best things about Twitter - perhaps, its greatest appeal - is in its accessibility for sharing and collecting real-time data. It represents an important source of data for business models of huge companies as well. 

All these characteristics put Twitter at the forefront in the network of social media networks. It plays a significant role in facilitating us, data enthusiasts, to infer psychological traits from user data such as tweet messages, user profile information, and the number of followers/ followings. 

This talk utilizes Twitter-API and Python programming to analyze Twitter data using techniques of machine learning and natural language processing making a very powerful combination of disciplines. 

The session will adhere to the following outline, with 5 minutes towards the end for Q&A. 

1. Collection of Data
Twitter data can be collected via Twitter-API. The Twitter-API platform provides three tiers for searching tweets. In this talk, the Twitter Streaming API is used. It gives access to all tweets as they get published on Twitter. These tweets are then stored in a MongoDB database. 

2. Analysis
(a) Content Analysis
>From the collected tweets, the text of the tweets is analyzed and visualized in the form of a word cloud. The tweets will also be mined for most-frequently occurring words and common named-entities. 
(b) Sentiment Analysis
The study of sentiment essentially relating to feelings; attitudes and opinions from Twitter Data is one of the most trivial yet daunting tasks of text analytics. The talk will cover two approaches for understanding a user's opinion or mood from tweets. First, would be a machine learning approach using logistic regression, and second, would be a natural language processing approach using an in-built library of TextBlob. 
(c) Temporal & Spatial Analysis
Collected tweets can also be analyzed on the basis of its temporality to determine whether and how concentrations of data are changing over time. 
Spatiality can also yield very interesting results on geographical and topological grounds, which will be useful in identifying patterns. 

Last but not least, the talk will conclude with a showcase of results in the form of interactive visualizations obtained from the aforementioned analysis on a web application.

Tags: [u'ML', u'nlp', u'sentiment-analysis', u'twitter', u'machine-learning', u'natural-language-processing', u'tweepy', u'data-science', u'social-media']