Gaussian Processes

Preoţiuc-Pietro, Daniel, Svitlana Volkova, Vasileios Lampos, Yoram Bachrach, and Nikolaos Aletras. "Studying User Income through Language, Behaviour and Affect in Social Media." PLoS ONE 10 (2015). AbstractWebsite

Automatically inferring user demographics from social media posts is useful for both social science research and a range of downstream applications in marketing and politics. We present the first extensive study where user behaviour on Twitter is used to build a predictive model of income. We apply non-linear methods for regression, i.e. Gaussian Processes, achieving strong correlation between predicted and actual user income. This allows us to shed light on the factors that characterise income on Twitter and analyse their interplay with user emotions and sentiment, perceived psycho-demographics and language use expressed through the topics of their posts. Our analysis uncovers correlations between different feature categories and income, some of which reflect common belief e.g. higher perceived education and intelligence indicates higher earnings, known differences e.g. gender and age differences, however, others show novel findings e.g. higher income users express more fear and anger, whereas lower income users express more of the time emotion and opinions.

Preoţiuc-Pietro, Daniel, Vasileios Lampos, and Nikolaos Aletras. An analysis of the user occupational class through Twitter content In ACL., 2015. AbstractPDFSlides

Social media content can be used as a complementary source to the traditional methods for extracting and studying collective social attributes. This study focuses on the prediction of the occupational class for a public user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles, posted textual content and platform-related attributes. We frame our task as classification using latent feature representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods can predict a user’s occupational class with strong accuracy for the coarsest level of a standard occupation taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream applications.

Preoţiuc-Pietro, Daniel, and Trevor Cohn. A temporal model of text periodicities using Gaussian Processes. EMNLP., 2013. AbstractPDFPoster

Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-of-the-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.

Lampos, Vasileios, Nikolaos Aletras, Daniel Preoţiuc-Pietro, and Trevor Cohn. Predicting and characterising user impact on Twitter. EACL., 2014. AbstractPDFPoster

The open structure of online social networks and their uncurated nature give rise to problems of user credibility and influence. In this paper, we address the task of predicting the impact of Twitter users based only on features under their direct control, such as usage statistics and the text posted in their tweets.We approach the problem as regression and apply linear as well as nonlinear learning methods to predict a user impact score, estimated by combining the numbers of the user’s followers, followees and listings. The experimental results point out that a strong prediction performance is achieved, especially for models based on the Gaussian Processes framework. Hence, we can interpret various modelling components, transforming them into indirect ‘suggestions’ for impact boosting.