Preoţiuc-Pietro, Daniel, Jordan Carpenter, and Lyle Ungar. Personality Driven Differences in Paraphrase Preference In Workshop on Natural Language Processing and Computational Social Science (NLP+CSS). ACL, 2017. AbstractPDFSlides

Personality plays a decisive role in how people behave in different scenarios, including online social media. Researchers have used such data to study how personality can be predicted from language use. In this paper, we study phrase choice as a particular stylistic linguistic difference, as opposed to the mostly topical differences identified previously. Building on previous work on demographic preferences, we quantify differences in paraphrase choice from a massive Facebook data set with posts from over 115,000 users. We quantify the predictive power of phrase choice in user profiling and use phrase choice to study psycholinguistic hypotheses. This work is relevant to future applications that aim to personalize text generation to specific personality types.

Hagan, Courtney, Jordan Carpenter, Lyle Ungar, and Daniel Preoţiuc-Pietro. "Personality Profiles of Users Sharing Animal-related Content on Social Media." Anthrozoos (2017). AbstractDraft

Animal preferences are thought to be linked with more salient psychological traits of people and most research examining owner personality as a differentiating factor has obtained mixed results. The rise in usage of social networks offers users a new medium in which users broadcast their preferences and activities, including about animals. In two studies, the first on Facebook status updates and the second on images shared on Twitter, we revisited the link between user Big Five personality traits and animal preference, specifically focusing on cats and dogs. We used automatic content analysis of text and images to unobtrusively measure preference for animals online using large data sets. Results from Study 1 indicated that those who mentioned ownership of a cat (by using the phrase ‘my cat’) in their status updates were more open to experience, introverted, neurotic and less conscientious when compared to the general population, while users mentioning ownership of a dog (by using ‘my dog’) were only less conscientious compared to the rest of the population. Study 2 foundfinds that users who featured either cat or dog images in their tweets are more neurotic, less conscientious and less agreeable than those who do not. In addition, posting images containing cats was specific to users higher in openness, while posting images featuring dogs was associated with users higher in extraversion. These findings taken together align with some previous findings on the relationship between owner personality and animal preference, additionally highlighting some social media specific behaviors.

Lampos, Vasileios, Nikolaos Aletras, Daniel Preoţiuc-Pietro, and Trevor Cohn. Predicting and characterising user impact on Twitter. EACL., 2014. AbstractPDFPoster

The open structure of online social networks and their uncurated nature give rise to problems of user credibility and influence. In this paper, we address the task of predicting the impact of Twitter users based only on features under their direct control, such as usage statistics and the text posted in their tweets.We approach the problem as regression and apply linear as well as nonlinear learning methods to predict a user impact score, estimated by combining the numbers of the user’s followers, followees and listings. The experimental results point out that a strong prediction performance is achieved, especially for models based on the Gaussian Processes framework. Hence, we can interpret various modelling components, transforming them into indirect ‘suggestions’ for impact boosting.

Sedoc, Joao, Daniel Preoţiuc-Pietro, and Lyle Ungar. Predicting Emotional Word Ratings using Distributional Representations and Signed Clustering In EACL., 2017. AbstractPDF

Inferring the emotional content of words is important for text-based sentiment analysis, dialogue systems and psycholinguistics, but word ratings are expensive to collect at scale and across languages or domains. We develop a method that automatically extends word-level ratings to unrated words using signed clustering of vector space word representations along with affect ratings. We use our method to determine a word's valence and arousal, which determine its position on the circumplex model of affect, the most popular dimensional model of emotion. Our method achieves superior out-of-sample word rating prediction on both affective dimensions across three different languages when compared to state-of-the-art word similarity based methods. Our method can assist building word ratings for new languages and improve downstream tasks such as sentiment analysis and emotion detection.

Aletras, Nikolaos, Dimitrios Tsarapatsanis, Daniel Preoţiuc-Pietro, and Vasileios Lampos. "Predicting Judicial Decisions of the European Court of Human Rights: A Natural Language Processing Perspective." PeerJ Computer Science (2016). AbstractWebsite

Recent advances in Natural Language Processing and Machine Learning provide us with the tools to build predictive models that can be used to unveil patterns driving judicial decisions. This can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions. This paper presents the first systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content. We formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e. N-grams, and topics. Our models can predict the court's decisions with a strong accuracy (79% on average). Our empirical analysis indicates that the formal facts of a case are the most important predictive factor. This is consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts. We also observe that the topical content of a case is another important feature in this classification task and explore this relationship further by conducting a qualitative analysis.