A temporal model of text periodicities using Gaussian Processes

Preoţiuc-Pietro, Daniel, and Trevor Cohn. A temporal model of text periodicities using Gaussian Processes. EMNLP., 2013.


Temporal variations of text are usually ignored in NLP applications. However, text use changes with time, which can affect many applications. In this paper we model periodic distributions of words over time. Focusing on hashtag frequency in Twitter, we first automatically identify the periodic patterns. We use this for regression in order to forecast the volume of a hashtag based on past data. We use Gaussian Processes, a state-of-the-art bayesian non-parametric model, with a novel periodic kernel. We demonstrate this in a text classification setting, assigning the tweet hashtag based on the rest of its text. This method shows significant improvements over competitive baselines.

[Dataset] [GPML kernel] [GPy kernel] [GPy README]

PDF484.86 KB
Poster2.18 MB