Social Media

Preotiuc-Pietro, Daniel, Sharath Chandra Guntuku, and Lyle Ungar. Controlling Human Perception of Basic User Traits In EMNLP., 2017. AbstractPDFPoster

Much of our online communication is text-mediated and, lately, more common with automated agents. Unlike interacting with humans, these agents currently do not tailor their language to the type of person they are communicating to. In this pilot study, we measure the extent to which human perception of basic user trait information – gender and age – is controllable through text. Using automatic models of gender and age prediction, we estimate which tweets posted by a user are more likely to mis-characterize his traits. We perform multiple controlled crowdsourcing experiments in which we show that we can reduce the human prediction accuracy of gender to almost random – a > 20% drop in accuracy. Our experiments show that it is practically feasible for multiple applications such as text generation, text summarization or machine translation to be tailored to specific traits and perceived as such.

Hagan, Courtney, Jordan Carpenter, Lyle Ungar, and Daniel Preoţiuc-Pietro. "Personality Profiles of Users Sharing Animal-related Content on Social Media." Anthrozoos (2017). AbstractDraft

Animal preferences are thought to be linked with more salient psychological traits of people and most research examining owner personality as a differentiating factor has obtained mixed results. The rise in usage of social networks offers users a new medium in which users broadcast their preferences and activities, including about animals. In two studies, the first on Facebook status updates and the second on images shared on Twitter, we revisited the link between user Big Five personality traits and animal preference, specifically focusing on cats and dogs. We used automatic content analysis of text and images to unobtrusively measure preference for animals online using large data sets. Results from Study 1 indicated that those who mentioned ownership of a cat (by using the phrase ‘my cat’) in their status updates were more open to experience, introverted, neurotic and less conscientious when compared to the general population, while users mentioning ownership of a dog (by using ‘my dog’) were only less conscientious compared to the rest of the population. Study 2 foundfinds that users who featured either cat or dog images in their tweets are more neurotic, less conscientious and less agreeable than those who do not. In addition, posting images containing cats was specific to users higher in openness, while posting images featuring dogs was associated with users higher in extraversion. These findings taken together align with some previous findings on the relationship between owner personality and animal preference, additionally highlighting some social media specific behaviors.

Preoţiuc-Pietro, Daniel, Jordan Carpenter, and Lyle Ungar. Personality Driven Differences in Paraphrase Preference In Workshop on Natural Language Processing and Computational Social Science (NLP+CSS). ACL, 2017. AbstractPDFSlides

Personality plays a decisive role in how people behave in different scenarios, including online social media. Researchers have used such data to study how personality can be predicted from language use. In this paper, we study phrase choice as a particular stylistic linguistic difference, as opposed to the mostly topical differences identified previously. Building on previous work on demographic preferences, we quantify differences in paraphrase choice from a massive Facebook data set with posts from over 115,000 users. We quantify the predictive power of phrase choice in user profiling and use phrase choice to study psycholinguistic hypotheses. This work is relevant to future applications that aim to personalize text generation to specific personality types.

Guntuku, Sharath Chandra, Weisi Lin, Jordan Carpenter, Wee Keong Ng, Lyle Ungar, and Daniel Preotiuc-Pietro. Studying Personality through the Content of Posted and Liked Images on Twitter In Web Science., 2017. AbstractPDFSlides

Interacting with images through social media has become widespread due to ubiquitous Internet access and multimedia enabled devices. Through images, users generally present their daily activities, preferences or interests. This study aims to identify the way and extent to which personality differences measured as using the Big Five model are related to online image posting and liking. In two experiments, the larger consisting of ~$1.5 million Twitter images both posted and liked by ~4,000 users, we extract interpretable semantic concepts using large-scale image content analysis and analyze differences specific of each personality trait. Predictive results show that image content can predict personality traits, and that there can be significant performance gain by fusing the signal from both posted and liked images.

Preoţiuc-Pietro, Daniel, Ye Liu, Daniel Hopkins, and Lyle Ungar. Beyond Binary Labels: Political Ideology Prediction of Twitter Users In ACL., 2017. AbstractPDFSlides

Automatic political orientation prediction from social media posts has to date proven successful only in distinguishing between publicly declared liberals and conservatives in the US. This study examines users’ political ideology using a seven-point scale which enables us to identify politically moderate and neutral users – groups which are of particular interest to political scientists and pollsters. Using a novel data set with political ideology labels self-reported through surveys, our goal is two-fold: a) to characterize the groups of politically engaged users through language use on Twitter; b) to build a fine-grained model that predicts political ideology of unseen users. Our results identify differences in both political leaning and engagement and the extent to which each group tweets using political keywords. Finally, we demonstrate how to improve ideology prediction accuracy by exploiting the relationships between the user groups.

Srijith, PK, Kalina Bontcheva, Mark Hepple, and Daniel Preoţiuc-Pietro. "Sub-Story Detection in Twitter with Hierarchical Dirichlet Processes." Information Processing and Management (2016). AbstractPDFWebsite

Social media has now become the de facto information source on real world events. The challenge, however, due to the high volume and velocity nature of social media streams, is in how to follow all posts pertaining to a given event over time – a task referred to as story detection. Moreover, there are often several different stories pertaining to a given event, which we refer to as sub-stories and the corresponding task of their automatic detection – as sub-story detection. This paper proposes hierarchical Dirichlet processes (HDP), a probabilistic topic model, as an effective method for automatic sub-story detection. HDP can learn sub-topics associated with sub-stories which enables it to handle subtle variations in sub-stories. It is compared with state-of-the-art story detection approaches based on locality sensitive hashing and spectral clustering. We demonstrate the superior performance of HDP for sub-story detection on real world Twitter data sets using various evaluation measures. The ability of HDP to learn sub-topics helps it to recall the sub-stories with high precision. This has resulted in an improvement of up to 60% in the F-score performance of HDP based sub-story detection approach compared to standard story detection approaches. A similar performance improvement is also seen using an information theoretic evaluation measure proposed for the sub-story detection task. Another contribution of this paper is in demonstrating that considering the conversational structures within the Twitter stream can bring up to 200% improvement in sub-story detection performance.

Carpenter, Jordan, Daniel Preoţiuc-Pietro, Lucie Flekova, Salvatore Giorgi, Courtney Hagan, Margaret Kern, Anneke Buffone, Lyle Ungar, and Martin Seligman. "Real Men don’t say 'cute': Using Automatic Language Analysis to Isolate Inaccurate Aspects of Stereotypes." Social Psychological and Personality Science (2016). AbstractDraftSupplemental MaterialsWebsite

People associate certain behaviors with certain social groups. These stereotypical beliefs consist of both accurate and inaccurate associations. Using large-scale, data driven methods with social media as a context, we isolate stereotypes by using verbal expression. Across four social categories - gender, age, education level, and political orientation - we identify words and phrases that lead people to incorrectly guess the social category of the writer. Although raters often correctly categorize authors, they overestimate the importance of some stereotype-congruent signal. Findings suggest that data-driven approaches might be a valuable and ecologically valid tool for identifying even subtle aspects of stereotypes and highlighting the facets that are exaggerated or misapplied.

Preoţiuc-Pietro, Daniel, Jordan Carpenter, Salvatore Giorgi, and Lyle Ungar. Studying the Dark Triad of Personality using Twitter Behavior. CIKM., 2016. AbstractPDF

Research into the darker traits of human nature is growing in interest especially in the context of increased social media usage. This allows users to express themselves to a wider online audience. We study the extent to which the standard model of dark personality – the dark triad – consisting of narcissism, psychopathy and Machiavellianism, is related to observable Twitter behavior such as platform usage, posted text and profile image choice. Our results show that we can map various behaviors to psychological theory and study new aspects related to social media usage. Finally, we build a machine learning algorithm that predicts the dark triad of personality in out-of-sample users with reliable accuracy.

Flekova, Lucie, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar, and Daniel Preoţiuc-Pietro. Analysing Biases in Human Perception of User Age and Gender from Text. ACL., 2016. AbstractPDFPoster

User traits disclosed through written text, such as age and gender, can be used to personalize applications such as recommender systems or conversational agents. However, human perception of these traits is not perfectly aligned with reality. In this paper, we conduct a large-scale crowdsourcing experiment on guessing age and gender from tweets. We systematically analyze the quality and possible biases of these predictions. We identify the textual cues which lead to miss-assessments of traits or make workers more or less confident in their choice. Our study demonstrates that differences between real and perceived traits are noteworthy and elucidates inaccurately used stereotypes in human perception.

Flekova, Lucie, Lyle Ungar, and Daniel Preoţiuc-Pietro. Exploring Stylistic Variation with Age and Income on Twitter. ACL., 2016. AbstractPDFSlides

Writing style allows NLP tools to adjust to the traits of an author. In this paper, we explore the relation between stylistic and syntactic features and authors’ age and income. We confirm our hypothesis that for numerous feature types writing style is predictive of income even beyond age. We analyze the predictive power of writing style features in a regression task on two data sets of around 5,000 Twitter users each. Additionally, we use our validated features to study daily variations in writing style of users from distinct income groups. Temporal stylistic patterns not only provide novel psychological insight into user behavior, but are useful for future research and applications in social media.