The Predictive Power of Social Analytics: Election 2016


Before LinkedIn users, journalists, bloggers and writers inundate us with the Top 10 Things I Learned from the Presidential Election posts I wanted to share an experiment I began in the late summer of 2015.

Having just removed the cellophane from a brand new social listening tool I’d been using it to collect consumer insights on a variety of topics and perceptions relevant to the brands I worked with as the director of social media at a small Boston-area ad agency. As I don’t have permission to mention the tool’s name, and these being a somewhat politically-themed post let’s call them Tool X. What made Tool X so attractive in the first place is that even according to it’s competitors it has the best natural language algorithm on the market. That is to say that it can tell the difference in the intended sentiment of “this beer sucks” and “My life sucks, I need a beer”. Tool X allowed me to dig deep into sentiment drivers, emotions, phrases, conversation volume, sentiment, channels. etc. It also allowed me to measure 3 things in a convenient quad chart: conversation volume, sentiment, passion.

Conversation volume is simply the amount of conversation happening around a given subject. Sentiment is whether or not those conversations are positive or negative. Passion is how positive or negative those conversations are. On August 3, 2015 I posted our first look at the landscape of the 2016 presidential race.

In the chart above the size of the circle indicates the conversation volume. The left to right axis indicates negative (left) to positive (right) while the up and down axis indicates the passion, negative or positive, of the sentiment. Now, this post excluded Chaffee, Fiorina and O’Malley and even has a typo which won’t be changed to maintain the integrity of the timeline. You can see that Christie was the clear leader in the very early stages. At this point there were less than 100 million people talking about the candidates, a number that would increase dramatically to over 200 million.

In this chart, which includes Fiorina, we can see the tone was set, from a voter perspective, if not media and pundit perspective.

Throughout the primary and general campaign timeline there was an increase in conversation around the candidates…

And a shift in consolidation of the conversation around specific candidates.

Over this time conservative voters seemed to gravitate toward Trump. Similarly the same thing happened on the liberal side around Bernie Sanders.

While this looks like it would refute the ability for social sentiment analysis to predict an outcome, as we’ve found from the DNC emails released by Wikileaks, the Democratic nominee wasn’t as much chosen by voters as it was by the party and media. While the manipulation of the primary has been widely covered elsewhere, data on the Democratic candidates, when seen side by side with major media reporting and polling, showed that something was afoot.

This was the first time to my knowledge that social listening and social sentiment have been used to visualize the outcome of such a major event. Similar data collection should be used to look at state and local campaigns, movie or artists popularity, or any other competitive situation in which large groups of people are the determining factor in the outcome. It is likely that the smaller the sample size for a local or state election, the sample size for this experiment ended up being over 200 million, will be less effective.

In his 1940s series of short stories, later collected into the novel Foundation, science fiction writer and scientist, Isaac Asimov developed the concept of psychohistory, a term not dissimilar to what has been attempted here. Mr. Asimov suggests that a large sample size is required and the larger it is the more accurate it is.

I have no doubt that Mr. Asimov, a significantly more intelligent, educated and talented individual than myself was right; the model population must be of significant size. As we know polls from the folks at Quinnipiac, ABC, Rasmussen, etc. use significantly smaller sample sizes:

While continuing to do social media analysis for clients I will continue to test this and other social analytics tools to look at further citizen-determined contests to understand the predictive capabilities of social conversations. I would love to hear from anyone else using similar or different tools that came up right or wrong in this recent contest.

Comments and questions are welcome.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>