Social Media

Showing results in 'Publications'. Show all posts
Flekova, Lucie, Eugen Ruppert, and Daniel Preotiuc-Pietro. Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words In Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA). EMNLP, 2015. AbstractPDFSlides

Contemporary sentiment analysis approaches rely heavily on lexicon based methods. This is mainly due to their simplicity, although the best empirical results can be achieved by more complex techniques. We introduce a method to assess suitability of generic sentiment lexicons for a given domain, namely to identify frequent bigrams where a polar word switches polarity. Our bigrams are scored using Lexicographers Mutual Information and leveraging large automatically obtained corpora. Our score matches human perception of polarity and demonstrates improvements in classification results using our enhanced context-aware method. Our method enhances the assessment of lexicon based sentiment detection algorithms and can be further used to quantify ambiguous words.

Lampos, Vasileios, Daniel Preoţiuc-Pietro, Sina Samangooei, Douwe Gelling, and Trevor Cohn. Extracting socioeconomic patterns from the news: Modelling text and outlet importance jointly In Workshop on Language Technologies and Computational Social Science (LACSS). ACL., 2014. AbstractPDFPoster

Information from news articles can be used to study correlations between textual discourse and socioeconomic patterns. This work focuses on the task of understanding how words contained in the news as well as the news outlets themselves may relate to a set of indicators, such as economic sentiment or unemployment rates. The bilinear nature of the applied regression model facilitates learning jointly word and outlet importance, supervised by these indicators. By evaluating the predictive ability of the extracted features, we can also assess their relevance to the target socioeconomic phenomena. Therefore, our approach can be formulated as a potential NLP tool, particularly suitable to the computational social science community, as it can be used to interpret connections between vast amounts of textual content and measurable society driven factors.

Lampos, Vasileios, Nikolaos Aletras, Daniel Preoţiuc-Pietro, and Trevor Cohn. Predicting and characterising user impact on Twitter. EACL., 2014. AbstractPDFPoster

The open structure of online social networks and their uncurated nature give rise to problems of user credibility and influence. In this paper, we address the task of predicting the impact of Twitter users based only on features under their direct control, such as usage statistics and the text posted in their tweets.We approach the problem as regression and apply linear as well as nonlinear learning methods to predict a user impact score, estimated by combining the numbers of the user’s followers, followees and listings. The experimental results point out that a strong prediction performance is achieved, especially for models based on the Gaussian Processes framework. Hence, we can interpret various modelling components, transforming them into indirect ‘suggestions’ for impact boosting.

Preoţiuc-Pietro, Daniel, Justin Cranshaw, and Tae Yano. Exploring venue-based city-to-city similarity measures In Workshop on Urban Computing (UrbComp). SIGKDD., 2013. AbstractPDF

In this work we explore the use of incidentally generated social network data for the folksonomic characterization of cities by the types of amenities located within them. Using data collected about venue categories in various cities, we examine the effect of different granularities of spatial aggregation and data normalization when representing a city as a collection of its venues. We introduce three vector-based representations of a city, where aggregations of the venue categories are done within a grid structure, within the city’s municipal neighborhoods, and across the city as a whole. We apply our methods to a novel dataset consisting of Foursquare venue data from 17 cities across the United States, totaling over 1 million venues. Our preliminary investigation demonstrates that different assumptions in the urban perception could lead to qualitative, yet distinctive, variations in the induced city description and categorization.

Lampos, Vasileios, Daniel Preoţiuc-Pietro, and Trevor Cohn. A user-centric model of voting intention from Social Media. ACL., 2013. AbstractPDFPoster

Social Media contain a multitude of user opinions which can be used to predict realworld phenomena in many domains including politics, finance and health. Most existing methods treat these problems as linear regression, learning to relate word frequencies and other simple features to a known response variable (e.g., voting intention polls or financial indicators). These techniques require very careful filtering of the input texts, as most Social Media posts are irrelevant to the task. In this paper, we present a novel approach which performs high quality filtering automatically, through modelling not just words but also users, framed as a bilinear
model with a sparse regulariser. We also consider the problem of modelling groups of related output variables, using a structured multi-task regularisation method. Our experiments on voting intention prediction demonstrate strong performance over large-scale input from Twitter on two distinct case studies, outperforming competitive baselines.

Rout, Dominic, Daniel Preoţiuc-Pietro, Bontcheva Kalina, and Trevor Cohn. Where's @wally: A classification approach to geolocating users based on their social ties. HT., 2013. AbstractPDF

This paper presents an approach to geolocating users of online social networks, based solely on their ‘friendship’ connections. We observe that users interact more regularly with those closer to themselves and hypothesise that, in many cases, a person’s social network is sufficient to reveal their location. The geolocation problem is formulated as a classification task, where the most likely city for a user without an explicit location is chosen amongst the known locations of their social ties. Our method uses an SVM classifier and a number of features that reflect different aspects and characteristics of Twitter user networks. The SVM classifier is trained and evaluated on a dataset of Twitter users with known locations. Our method outperforms a state-of-the-art method for geolocating users based on their social ties

Preoţiuc-Pietro, Daniel, and Trevor Cohn. A temporal model of text periodicities using Gaussian Processes. EMNLP., 2013. AbstractPDFPoster

Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-of-the-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.

Preoţiuc-Pietro, Daniel, and Trevor Cohn. Mining user behaviours: A study of check-in patterns in Location Based Social Networks. WebSci., 2013. AbstractPDFPoster

Understanding the patterns underlying human mobility is of an essential importance to applications like recommender systems. In this paper we investigate the behaviour of around 10,000 frequent users of Location Based Social Networks (LBSNs) making use of their full movement patterns. We analyse the metadata associated with the whereabouts of the users, with emphasis on the type of places and their evolution over time. We uncover patterns across different temporal scales for venue category usage. Then, focusing on individual users, we apply this knowledge in two tasks: 1) clustering users based on their behaviour and 2) predicting users’ future movements. By this, we demonstrate both qualitatively and quantitatively that incorporating temporal regularities is beneficial for making better sense of user behaviour.

Preoţiuc-Pietro, Daniel, Sina Samangooei, Trevor Cohn, Nick Gibbins, and Mahesan Niranjan. Trendminer: an architecture for real time analysis of social media text In Workshop on Real-Time Analysis and Mining of Social Streams (RAMSS). ICWSM., 2012. AbstractPDFSlides

The emergence of online social networks (OSNs) and the accompanying availability of large amounts of data, pose a number of new natural language processing (NLP) and computational challenges. Data from OSNs is different to data from traditional sources (e.g. newswire). The texts are short, noisy and conversational. Another important issue is that data occurs in a real-time streams, needing immediate analysis that is grounded in time and context. In this paper we describe a new open-source framework for efficient text processing of streaming OSN data (available at Whilst researchers have made progress in adapting or creating text analysis tools for OSN data, a system to unify these tasks has yet to be built. Our system is focused on a real world scenario where fast processing and accuracy is paramount. We use the MapReduce framework for distributed computing and present running times for our system in order to show that scaling to online scenarios is feasible. We describe the components of the system and evaluate their accuracy. Our system supports easy integration of future modules in order to extend its functionality.