Pawel Janiak

South African Ruby on Rails developer, ranter, biker.

Polarizing information presentation to increase information spread.

24 May 2013

The study

By using sentiment extraction techniques, sentiment classifiers and standardized statistical analysis models, a study by Rene ́Pfitzner, Antonios Garas and Frank Schweitzer identifies the role that emotions play in the information-spreading process in Twitter. Two key questions asked were whether emotionally charged tweets (or even those that are emotionally neutral) are more likely to be retweeted, and whether emotionally diverse tweets (those that emotionally polarize our opinions) were more or less likely to be retweeted.

As per the paper, Twitter users - and for the sake of argument, users in all informational transfer contexts - are involved mostly in two processes: creating information and distributing it. Distribution is either subsequent distribution or pure information distribution. The former being by rewording a message and distributing it, and the latter by redistributing it without any alteration (Retweeting in the Twitter context). Users have a preference skew towards the former.

The study used SentiStrength to determine the positive and negative sentiment strengths of all tweets in the study. Each tweet is classified with 2 values, a positive sentiment value and a negative sentiment value where the values ranges are [1,5] and [-5,-1] respectively. The mean of the two returns a simplified overall sentiment score.

This paper is set around determining the significance of sentiment divergence in tweets to their propensity to be purely redistributed (Retweeted). Ultimately, the mean sentiment of a tweet - whether it is positive or negative - is not a good predictor of whether it will be retweeted or not.

In addressing the emotional polarity between tweets and retweets, there was little difference between the the tweet ratio of the two. Roughly 47% of all tweets have a positive sentiment of +1 and about 20% have a negative sentiment of -1. This is in line with the Pollyanna Hypothesis that there is a universal tendency for people to have a positive word bias.

Tweets with high emotional divergence (the absolute difference or range between the positive and negative sentiment scores) are found to be more likely to be retweeted than those with low emotional divergence. The emotional divergence threshold dividing low-divergence tweets from high-divergence tweets was around 0.4, with a peak of about 0.9 afterwhich the retweet propensity drops. In a sample of tweets, the chance of finding a tweet with an emotional divergence score of 0.9 is roughly 1.7 times higher than finding a tweet where the emotional divergence is 0 (where positive and negative sentiment are equal).

Using this information, and a retweet probability of 0.09, the probability of a tweet with a 0.9 emotional divergence score being retweeted is 0.14. A tweet with an emotional divergence score of 0.3 has a 0.03 probability of being retweeted. That’s roughly a 1-in-7 of a highly emotionally polarized tweet being retweeted. A tangential fact given in the study is that the average number of tweets in terms of arithmetic mean is 7.4 with the median being only 2 tweets.

Notes

This paper didn’t take into account the number of followers and thus the initial exposure of a tweet as a factor, so emotionally charged tweets with very few followers will still have a much smaller chance of “going viral” than those originating from high-follower count users (this is my conjecture).

Another note is that there may exist some market gap for a tool that evaluates emotional divergence on the fly using tools like SentiStrength, or other machine learning libraries, that will notify users of the potential emotional polarity of their tweets. This would be a great tool for Twitter antagonists.

One other note is that tools like SentiStrength don’t take into account the semantic meaning of certain words from a cultural perspective, but I don’t have enough knowledge on this to comment with any confidence.