Tools for mining non-stationary data - v2. Clustering models for discovery of regional and demographic variation - v2

Citation:
Preoţiuc-Pietro, Daniel, Sina Samangooei, Andrea Varga, Douwe Gelling, Trevor Cohn, and Mahesan Niranjan. Tools for mining non-stationary data - v2. Clustering models for discovery of regional and demographic variation - v2. Public Deliverable for Trendminer Project, 2014.

Report Date:

04/2014

Report Number:

D3.3.1

Abstract:

This document presents advanced research and software development work for Task 3.2 on tools for mining non-stationary data and for Task 3.3 on clustering models integrating regional and demographic information for the aim of understanding streaming data. First, for modelling non-stationary data, a research experiment is presented for categorising and forecasting word frequency patterns using Gaussian Processes, with an emphasis on word periodicities. A new soft clustering method based on topic models is introduced, which learns topics and their temporal profile jointly. For using regional and demographic user information, the predictive model presented in previous work (Samangooei et al., 2013) is extended. This is used to identify differences in voting intention between different regions of the United Kingdom and different genders. For discovering specific regional clusters, the soft clustering technique is extended to learn the topics, their regional and temporal profile jointly. Finally, the predictive and clustering models developed on social media data are applied to a news summary dataset where richer linguistic features are also used.

PreviewAttachmentSize
PDF2.14 MB