User-level prediction

Showing results in 'Publications'. Show all posts
Preoţiuc-Pietro, Daniel, Ye Liu, Daniel Hopkins, and Lyle Ungar. Beyond Binary Labels: Political Ideology Prediction of Twitter Users In ACL., 2017. AbstractPDFSlides

Automatic political orientation prediction from social media posts has to date proven successful only in distinguishing between publicly declared liberals and conservatives in the US. This study examines users’ political ideology using a seven-point scale which enables us to identify politically moderate and neutral users – groups which are of particular interest to political scientists and pollsters. Using a novel data set with political ideology labels self-reported through surveys, our goal is two-fold: a) to characterize the groups of politically engaged users through language use on Twitter; b) to build a fine-grained model that predicts political ideology of unseen users. Our results identify differences in both political leaning and engagement and the extent to which each group tweets using political keywords. Finally, we demonstrate how to improve ideology prediction accuracy by exploiting the relationships between the user groups.

Guntuku, Sharath Chandra, Weisi Lin, Jordan Carpenter, Wee Keong Ng, Lyle Ungar, and Daniel Preotiuc-Pietro. Studying Personality through the Content of Posted and Liked Images on Twitter In Web Science., 2017. AbstractPDFSlides

Interacting with images through social media has become widespread due to ubiquitous Internet access and multimedia enabled devices. Through images, users generally present their daily activities, preferences or interests. This study aims to identify the way and extent to which personality differences measured as using the Big Five model are related to online image posting and liking. In two experiments, the larger consisting of ~$1.5 million Twitter images both posted and liked by ~4,000 users, we extract interpretable semantic concepts using large-scale image content analysis and analyze differences specific of each personality trait. Predictive results show that image content can predict personality traits, and that there can be significant performance gain by fusing the signal from both posted and liked images.

Preotiuc-Pietro, Daniel, Sharath Chandra Guntuku, and Lyle Ungar. Controlling Human Perception of Basic User Traits In EMNLP., 2017. AbstractPDFPoster

Much of our online communication is text-mediated and, lately, more common with automated agents. Unlike interacting with humans, these agents currently do not tailor their language to the type of person they are communicating to. In this pilot study, we measure the extent to which human perception of basic user trait information – gender and age – is controllable through text. Using automatic models of gender and age prediction, we estimate which tweets posted by a user are more likely to mis-characterize his traits. We perform multiple controlled crowdsourcing experiments in which we show that we can reduce the human prediction accuracy of gender to almost random – a > 20% drop in accuracy. Our experiments show that it is practically feasible for multiple applications such as text generation, text summarization or machine translation to be tailored to specific traits and perceived as such.

Preoţiuc-Pietro, Daniel, Wei Xu, and Lyle Ungar. Discovering User Attribute Stylistic Differences via Paraphrasing In AAAI., 2016. AbstractPDFSlides

User attribute prediction from social media text has proven successful and useful for downstream tasks. In previous studies, user trait differences have been limited primarily to the presence or absence of words that indicate topical preferences. In this study, we aim to find linguistic style distinctions across three different user attributes: gender, age and occupational class. By combining paraphrases with a simple yet effective method, we capture a wide set of stylistic differences that are exempt from topic bias. We show their predictive power in user profiling, conformity with human perception and psycholinguistic hypotheses, and potential use in generating natural language tailored to specific user traits.

Leqi, Liu, Daniel Preoţiuc-Pietro, Zahra Riahi, Mohsen E. Moghaddam, and Lyle Ungar. Analyzing Personality through Social Media Profile Picture Choice In ICWSM., 2016. AbstractPDFSlides

The content of images users post to their social media is driven in part by personality. In this study, we analyze how Twitter profile images vary with the personality of the users posting them. In our main analysis, we use profile images from over 66,000 users whose personality we estimate based on their tweets. To facilitate interpretability, we focus our analysis on aesthetic and facial features and control for demographic variation in image features and personality. Our results show significant differences in profile picture choice between personality traits, and that these can be harnessed to predict personality traits with robust accuracy. For example, agreeable and conscientious users display more positive emotions in their profile pictures, while users high in openness prefer more aesthetic photos.

Flekova, Lucie, Lyle Ungar, and Daniel Preoţiuc-Pietro. Exploring Stylistic Variation with Age and Income on Twitter. ACL., 2016. AbstractPDFSlides

Writing style allows NLP tools to adjust to the traits of an author. In this paper, we explore the relation between stylistic and syntactic features and authors’ age and income. We confirm our hypothesis that for numerous feature types writing style is predictive of income even beyond age. We analyze the predictive power of writing style features in a regression task on two data sets of around 5,000 Twitter users each. Additionally, we use our validated features to study daily variations in writing style of users from distinct income groups. Temporal stylistic patterns not only provide novel psychological insight into user behavior, but are useful for future research and applications in social media.

Flekova, Lucie, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and Daniel Preoţiuc-Pietro. Analysing Biases in Human Perception of User Age and Gender from Text. ACL., 2016. AbstractPDFPoster

User traits disclosed through written text, such as age and gender, can be used to personalize applications such as recommender systems or conversational agents. However, human perception of these traits is not perfectly aligned with reality. In this paper, we conduct a large-scale crowdsourcing experiment on guessing age and gender from tweets. We systematically analyze the quality and possible biases of these predictions. We identify the textual cues which lead to miss-assessments of traits or make workers more or less confident in their choice. Our study demonstrates that differences between real and perceived traits are noteworthy and elucidates inaccurately used stereotypes in human perception.

Preoţiuc-Pietro, Daniel, Jordan Carpenter, Salvatore Giorgi, and Lyle Ungar. Studying the Dark Triad of Personality using Twitter Behavior. CIKM., 2016. AbstractPDF

Research into the darker traits of human nature is growing in interest especially in the context of increased social media usage. This allows users to express themselves to a wider online audience. We study the extent to which the standard model of dark personality – the dark triad – consisting of narcissism, psychopathy and Machiavellianism, is related to observable Twitter behavior such as platform usage, posted text and profile image choice. Our results show that we can map various behaviors to psychological theory and study new aspects related to social media usage. Finally, we build a machine learning algorithm that predicts the dark triad of personality in out-of-sample users with reliable accuracy.

Preotiuc-Pietro, Daniel, Maarten Sap, Andrew H. Schwartz, and Lyle Ungar. Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task In Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality (CLPysch). NAACL, 2015. AbstractPDF

This article is a system description and report on the submission of the World Well-Being Project from the University of Pennsylvania in the `CLPsych 2015' shared task. The goal of the shared task was to automatically determine Twitter users who self-reported having one of two mental illnesses: post traumatic stress disorder (PTSD) and depression. Our system employs user metadata and textual features derived from Twitter posts. To reduce the feature space and avoid data sparsity, we consider several word clustering approaches. We explore the use of linear classifiers based on different feature sets as well as a combination use a linear ensemble. This method is agnostic of illness specific features, such as lists of medicines, thus making it readily applicable in other scenarios. Our approach ranked second in all tasks on average precision and showed best results at .1 false positive rates.

Preotiuc-Pietro, Daniel, Johannes Eichstaedt, Gregory Park, Maarten Sap, Laura Smith, Victoria Tobolsky, Andrew H. Schwartz, and Lyle Ungar. The Role of Personality, Age and Gender in Tweeting about Mental Illnesses In Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality (CLPsych). NAACL, 2015. AbstractPDFSlides

Mental illnesses, such as depression and post traumatic stress disorder (PTSD), are highly underdiagnosed globally. Populations sharing similar demographics and personality traits are known to be more at risk than others. In this study, we characterise the language use of users disclosing their mental illness on Twitter. Language-derived personality and demographic estimates show surprisingly strong performance in distinguishing users that tweet a diagnosis of depression or PTSD from random controls, reaching an area under the receiver operating characteristic curve – AUC – of around .8 in all our binary classification tasks. In fact, when distinguishing users disclosing depression from those disclosing PTSD, the single feature of estimated age shows nearly as strong performance (AUC = .806) as using thousands of topics (AUC = .819) or tens of thousands of n-grams (AUC = .812). We also find that differential language analyses, controlled for demographics, recover many symptoms associated with the mental illnesses in the clinical literature.