User-level prediction

Showing results in 'Publications'. Show all posts
Preoţiuc-Pietro, Daniel, Vasileios Lampos, and Nikolaos Aletras. An analysis of the user occupational class through Twitter content In ACL., 2015. AbstractPDFSlides

Social media content can be used as a complementary source to the traditional methods for extracting and studying collective social attributes. This study focuses on the prediction of the occupational class for a public user profile. Our analysis is conducted on a new annotated corpus of Twitter users, their respective job titles, posted textual content and platform-related attributes. We frame our task as classification using latent feature representations such as word clusters and embeddings. The employed linear and, especially, non-linear methods can predict a user’s occupational class with strong accuracy for the coarsest level of a standard occupation taxonomy which includes nine classes. Combined with a qualitative assessment, the derived results confirm the feasibility of our approach in inferring a new user attribute that can be embedded in a multitude of downstream applications.

Flekova, Lucie, Daniel Preoţiuc-Pietro, Jordan Carpenter, Salvatore Giorgi, and Lyle Ungar. Analyzing crowdsourced assessment of user traits through Twitter posts In Work-in-Progress. HCOMP, 2015. AbstractPDFSup. MaterialsPoster

Social media allows any user to express themselves to the public through posting content. Using a crowdsourcing experiment, we aim to quantify and analyze which human attributes lead to better perceptions of the true identity of others. Using tweet content from a set of users with known age and gender information, we ask workers to rate their perception of these traits and we analyze those results in relation to the crowdsourcing workers’ age and gender. Results show that female workers are both more confident and more accurate at reporting gender, and workers in their thirties were most accurate but least confident for rating age. Our study is a first step in identifying the worker traits which contribute to a better understanding of others through their posted text content. Our findings help to identify the types of workers best suited for certain tasks.

Preoţiuc-Pietro, Daniel, Svitlana Volkova, Vasileios Lampos, Yoram Bachrach, and Nikolaos Aletras. "Studying User Income through Language, Behaviour and Affect in Social Media." PLoS ONE 10 (2015). AbstractWebsite

Automatically inferring user demographics from social media posts is useful for both social science research and a range of downstream applications in marketing and politics. We present the first extensive study where user behaviour on Twitter is used to build a predictive model of income. We apply non-linear methods for regression, i.e. Gaussian Processes, achieving strong correlation between predicted and actual user income. This allows us to shed light on the factors that characterise income on Twitter and analyse their interplay with user emotions and sentiment, perceived psycho-demographics and language use expressed through the topics of their posts. Our analysis uncovers correlations between different feature categories and income, some of which reflect common belief e.g. higher perceived education and intelligence indicates higher earnings, known differences e.g. gender and age differences, however, others show novel findings e.g. higher income users express more fear and anger, whereas lower income users express more of the time emotion and opinions.

Lampos, Vasileios, Nikolaos Aletras, Daniel Preoţiuc-Pietro, and Trevor Cohn. Predicting and characterising user impact on Twitter. EACL., 2014. AbstractPDFPoster

The open structure of online social networks and their uncurated nature give rise to problems of user credibility and influence. In this paper, we address the task of predicting the impact of Twitter users based only on features under their direct control, such as usage statistics and the text posted in their tweets.We approach the problem as regression and apply linear as well as nonlinear learning methods to predict a user impact score, estimated by combining the numbers of the user’s followers, followees and listings. The experimental results point out that a strong prediction performance is achieved, especially for models based on the Gaussian Processes framework. Hence, we can interpret various modelling components, transforming them into indirect ‘suggestions’ for impact boosting.

Rout, Dominic, Daniel Preoţiuc-Pietro, Bontcheva Kalina, and Trevor Cohn. Where's @wally: A classification approach to geolocating users based on their social ties. HT., 2013. AbstractPDF

This paper presents an approach to geolocating users of online social networks, based solely on their ‘friendship’ connections. We observe that users interact more regularly with those closer to themselves and hypothesise that, in many cases, a person’s social network is sufficient to reveal their location. The geolocation problem is formulated as a classification task, where the most likely city for a user without an explicit location is chosen amongst the known locations of their social ties. Our method uses an SVM classifier and a number of features that reflect different aspects and characteristics of Twitter user networks. The SVM classifier is trained and evaluated on a dataset of Twitter users with known locations. Our method outperforms a state-of-the-art method for geolocating users based on their social ties