Computational social science

Preotiuc-Pietro, Daniel, Sharath Chandra Guntuku, and Lyle Ungar. Controlling Human Perception of Basic User Traits In EMNLP., 2017. AbstractPDFPoster

Much of our online communication is text-mediated and, lately, more common with automated agents. Unlike interacting with humans, these agents currently do not tailor their language to the type of person they are communicating to. In this pilot study, we measure the extent to which human perception of basic user trait information – gender and age – is controllable through text. Using automatic models of gender and age prediction, we estimate which tweets posted by a user are more likely to mis-characterize his traits. We perform multiple controlled crowdsourcing experiments in which we show that we can reduce the human prediction accuracy of gender to almost random – a > 20% drop in accuracy. Our experiments show that it is practically feasible for multiple applications such as text generation, text summarization or machine translation to be tailored to specific traits and perceived as such.

Preoţiuc-Pietro, Daniel, Ye Liu, Daniel Hopkins, and Lyle Ungar. Beyond Binary Labels: Political Ideology Prediction of Twitter Users In ACL., 2017. AbstractPDFSlides

Automatic political orientation prediction from social media posts has to date proven successful only in distinguishing between publicly declared liberals and conservatives in the US. This study examines users’ political ideology using a seven-point scale which enables us to identify politically moderate and neutral users – groups which are of particular interest to political scientists and pollsters. Using a novel data set with political ideology labels self-reported through surveys, our goal is two-fold: a) to characterize the groups of politically engaged users through language use on Twitter; b) to build a fine-grained model that predicts political ideology of unseen users. Our results identify differences in both political leaning and engagement and the extent to which each group tweets using political keywords. Finally, we demonstrate how to improve ideology prediction accuracy by exploiting the relationships between the user groups.

Guntuku, Sharath Chandra, Weisi Lin, Jordan Carpenter, Wee Keong Ng, Lyle Ungar, and Daniel Preotiuc-Pietro. Studying Personality through the Content of Posted and Liked Images on Twitter In Web Science., 2017. AbstractPDFSlides

Interacting with images through social media has become widespread due to ubiquitous Internet access and multimedia enabled devices. Through images, users generally present their daily activities, preferences or interests. This study aims to identify the way and extent to which personality differences measured as using the Big Five model are related to online image posting and liking. In two experiments, the larger consisting of ~$1.5 million Twitter images both posted and liked by ~4,000 users, we extract interpretable semantic concepts using large-scale image content analysis and analyze differences specific of each personality trait. Predictive results show that image content can predict personality traits, and that there can be significant performance gain by fusing the signal from both posted and liked images.

Preoţiuc-Pietro, Daniel, Jordan Carpenter, and Lyle Ungar. Personality Driven Differences in Paraphrase Preference In Workshop on Natural Language Processing and Computational Social Science (NLP+CSS). ACL, 2017. AbstractPDFSlides

Personality plays a decisive role in how people behave in different scenarios, including online social media. Researchers have used such data to study how personality can be predicted from language use. In this paper, we study phrase choice as a particular stylistic linguistic difference, as opposed to the mostly topical differences identified previously. Building on previous work on demographic preferences, we quantify differences in paraphrase choice from a massive Facebook data set with posts from over 115,000 users. We quantify the predictive power of phrase choice in user profiling and use phrase choice to study psycholinguistic hypotheses. This work is relevant to future applications that aim to personalize text generation to specific personality types.

Preoţiuc-Pietro, Daniel, Wei Xu, and Lyle Ungar. Discovering User Attribute Stylistic Differences via Paraphrasing In AAAI., 2016. AbstractPDFSlides

User attribute prediction from social media text has proven successful and useful for downstream tasks. In previous studies, user trait differences have been limited primarily to the presence or absence of words that indicate topical preferences. In this study, we aim to find linguistic style distinctions across three different user attributes: gender, age and occupational class. By combining paraphrases with a simple yet effective method, we capture a wide set of stylistic differences that are exempt from topic bias. We show their predictive power in user profiling, conformity with human perception and psycholinguistic hypotheses, and potential use in generating natural language tailored to specific user traits.

Fulgoni, Dean, Jordan Carpenter, Lyle Ungar, and Daniel Preoţiuc-Pietro. An Empirical Exploration of Moral Foundations Theory in Partisan News Sources In LREC., 2016. AbstractPDFPoster

News sources frame issues in different ways in order to appeal or control the perception of their readers. We present a large scale study of news articles from partisan sources in the US across a variety of different issues. We first highlight that differences between sides exist by predicting the political leaning of articles of unseen political bias. Framing can be driven by different types of morality that each group values. We emphasize differences in framing of different news building on the moral foundations theory quantified using hand crafted lexicons. Our results show that partisan sources frame political issues differently both in terms of words usage and through the moral foundations they relate to.

Leqi, Liu, Daniel Preoţiuc-Pietro, Zahra Riahi, Mohsen E. Moghaddam, and Lyle Ungar. Analyzing Personality through Social Media Profile Picture Choice In ICWSM., 2016. AbstractPDFSlides

The content of images users post to their social media is driven in part by personality. In this study, we analyze how Twitter profile images vary with the personality of the users posting them. In our main analysis, we use profile images from over 66,000 users whose personality we estimate based on their tweets. To facilitate interpretability, we focus our analysis on aesthetic and facial features and control for demographic variation in image features and personality. Our results show significant differences in profile picture choice between personality traits, and that these can be harnessed to predict personality traits with robust accuracy. For example, agreeable and conscientious users display more positive emotions in their profile pictures, while users high in openness prefer more aesthetic photos.

Cano, Amparo Elisabeth, Daniel Preoţiuc-Pietro, Danica Radovanovic, Katrin Weller, and Aba-Sah Dadzie. #Microposts2016 – 6th Workshop on ‘Making Sense of Microposts’ In WWW., 2016. Abstract

#Microposts2016, the 6th workshop on Making Sense of Microposts, is summarised by the sub-theme: big things come in small packages. The workshop serves as a forum to discuss and promote research on the generation, analysis and reuse of Microposts – small chunks of information published on social media and messaging platforms. Low effort and cost to publish Microposts gives a voice to all, across differences in expertise, socio-cultural, generational and economic spheres, covering a wide swathe of topics, posted in the moment and on the go, during events, crises and personal experiences. While the usual suspects, including Twitter, Facebook, Instagram and Pinterest continue to dominate, especially as services are merged or shared across platforms,
newer players such as WhatsApp, Vine, Meerkat andYik Yak are growing in popularity, with increased access to fast, high capacity networks and advanced small, personal devices. #Microposts2016 solicited participation from Computer Science and other relevant fields, with a focus on interdisciplinary work. Starting in 2015, the workshop includes a track dedicated to encouraging research employing methods for analysis of Microposts in the Social Sciences.

Flekova, Lucie, Lyle Ungar, and Daniel Preoţiuc-Pietro. Exploring Stylistic Variation with Age and Income on Twitter. ACL., 2016. AbstractPDFSlides

Writing style allows NLP tools to adjust to the traits of an author. In this paper, we explore the relation between stylistic and syntactic features and authors’ age and income. We confirm our hypothesis that for numerous feature types writing style is predictive of income even beyond age. We analyze the predictive power of writing style features in a regression task on two data sets of around 5,000 Twitter users each. Additionally, we use our validated features to study daily variations in writing style of users from distinct income groups. Temporal stylistic patterns not only provide novel psychological insight into user behavior, but are useful for future research and applications in social media.

Flekova, Lucie, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and Daniel Preoţiuc-Pietro. Analysing Biases in Human Perception of User Age and Gender from Text. ACL., 2016. AbstractPDFPoster

User traits disclosed through written text, such as age and gender, can be used to personalize applications such as recommender systems or conversational agents. However, human perception of these traits is not perfectly aligned with reality. In this paper, we conduct a large-scale crowdsourcing experiment on guessing age and gender from tweets. We systematically analyze the quality and possible biases of these predictions. We identify the textual cues which lead to miss-assessments of traits or make workers more or less confident in their choice. Our study demonstrates that differences between real and perceived traits are noteworthy and elucidates inaccurately used stereotypes in human perception.