You'll find a wide variety of super cool plots while scrolling down to know more about our project.
And guess what ? All of the plots are interactive, that means you can zoom in, select parts of the data, filter out categories, compare plots, and much more. Don't be shy, try them out !
The 2016 US elections have been a raging war on social media and especially Twitter due to an incredible amount of people (including the candidates) using this media !
You probably heard about this social media concept called "media bubble", this concept explains that you and your friends tend to have the same views on many subjects. That's one of the reasons why so many people were shocked about Donald Trump's election, they simply didn't see it coming.
So all in all, don't you want to know what the world really thought during november 2016 ? Don't you want to discover what were the trendy hashtags, what Sunshine420too has to say about these elections or simply to know who the hell Sunshine420too is ? Then this page is made for you !
We'll go through some nice data statistics providing great plot and top tweets such as
If Donald Trump takes people's anger and turns it against Muslims, Hispanics, African Americans and women, we will be his worst nightmare.
— Bernie Sanders (@SenSanders) November 10, 2016
"To all the little girls watching...never doubt that you are valuable and powerful & deserving of every chance & opportunity in the world."
— Hillary Clinton (@HillaryClinton) November 9, 2016
Find out how we did all this ! We'll also analyze a Twitter network made of user
Cool right ? And finally we'll run a sentiment analysis to find out what people's reactions were !
Thanks for reading, enjoy !
The data we are analyzing is a collection of tweets from Twitter retrieved during the period of the elections. For this, we focused on collecting tweets from the week prior to election day to the week after election day.
To retrieve the data, we came up with custom queries to collect tweets related to the elections, by using hashtags and focusing on some countries of the world that seemed relevant to study.
In total, after collecting more than 4GB of tweets, we extracted 90909 unique tweets and 68359 unique users.
Want to get a glimpse of the data we are using for yourself ? You can download a subset of it here: download link. Want more ? Get in contact with us !
A first interesting overview of the data is to see how many tweets we collected per hour during the elections:
The spike we see is on November 9, which is #ElectionDay.
To get a glimpse of the data we have, let's check out the most retweeted and favorited tweets during the elections:
TODAY WE MAKE AMERICA GREAT AGAIN!
— Donald J. Trump (@realDonaldTrump) November 8, 2016
This is an embarrassing night for America. We've let a hatemonger lead our great nation. We've let a bully set our course. I'm devastated.
— Chris Evans (@ChrisEvans) November 9, 2016
My newest horror story: Once upon a time there was a man named Donald Trump, and he ran for president. Some people wanted him to win.
— Stephen King (@StephenKing) October 21, 2016
If Donald Trump takes people's anger and turns it against Muslims, Hispanics, African Americans and women, we will be his worst nightmare.
— Bernie Sanders (@SenSanders) November 10, 2016
HOLY SHIT THERES A TRUMP PROTEST ON CLUB PENGUIN RIGHT NOW pic.twitter.com/tY33Hq2mvi
— Lourdes (@gossipgriII) November 13, 2016
You didn't expect to see a Trump protest on Club Penguin, right ?
Let's now see which users were the most retweeted:
Wow, the @nytimes is losing thousands of subscribers because of their very poor and highly inaccurate coverage of the "Trump phenomena"
— Donald J. Trump (@realDonaldTrump) November 13, 2016
#RonPaul on #US election: "The greatest thing happened is the evidence & information we got from #WikiLeaks" https://t.co/2dJU6gDBks
— WikiLeaks (@wikileaks) November 15, 2016
I hope today we defeat Donald Trump and we defeat him badly. https://t.co/8ttsSwcsnl pic.twitter.com/4Q1JDdglhV
— Bernie Sanders (@BernieSanders) November 8, 2016
"To all the little girls watching...never doubt that you are valuable and powerful & deserving of every chance & opportunity in the world."
— Hillary Clinton (@HillaryClinton) November 9, 2016
We have a president-elect who doesn't believe in climate change. Millions of people are going to have to say: Mr. Trump, you are dead wrong. pic.twitter.com/Xc77gh3Iui
— Bernie Sanders (@SenSanders) November 13, 2016
The results come mostly from candidates to the elections, with the exception of WikiLeaks that played an instrumental role in shining light on the practices of the US politicians to the public..
Now let's have a look at the distribution of tweets based on the queries we performed using hashtags:
If we only focus on #Elections2016, #ImStillWithHer, #PresidentElectTrump, #TrumpRiot, and #NotMyPresident to track a few particular hashtags, we can see some interesting results:
The last plot shows us that the #Elections2016 vanished on the election day, to give space to #PresidentElectTrump. We also see a few reactions on the election day and when there was riots.
Focusing on the last part of our data, we can see which countries were the most active during the elections:
Clearly, and logicially, we can see that apart from the US, the UK is also very active, along with Canada that tweeted a lot of November 9, Australia, India, and South Africa. This is quite logic as those countries speak primarly English or were British colonies. On November 9, South American countries also reacted vividly to the elections, with Brazil and Mexico. In Europe, France and Germany leaded the way, followed by Sweden, and Denmark.
If we now look at some countries that we selected to make the plot clearer, we can see that apart from the US, Canada, and the UK, the other countries tweet more or less the same throughout the days during the elections, with the expection of a spike on election day.
To finish our overview of data, we'll visualise which countries were the most active during the elections on a world map:
In this part we will talk about our network and all our discoveries.
The goal of this directed network is to group every user we got from the scrapped tweets and their friends (e.g people they are following) and go through different step to show :
With the help of the Twitter API we can build this graph by connecting our user with a direct link to the account they follow ! In the end we have a brand (big!) new graph that has 64728 node and 78788 edges. If we have a look at the biggest accounts in terms of follower (nodes' in degree) we run into famous names such as : wikileaks, BarackObama, realDonaldTrump or HillaryClinton.
But looking at the biggest accounts in term of friend (nodes' out degree) we have less known and average people
Now that we have our graph we can have a look at the degree distributions (e.g try to indentify if the degree distributions follow some kind of function):
Go ahead, try to zoom on the graph !
Now let's have a look at the out degree
In Figure 1 we can see that the most common in degree is 2, with more than 50k occurences, but after 6 in degree it seems we have no more nodes. However the out degree distribution is way broader. very few users have the same out degree but it goes from 0 to roughly 5000.
The in degree distributions seem to approach a power-law as we can see the long tail (Figure 3) at the begining where the number of nodes with low degrees is really high and then it decreases as a power-law (with a few fluctuations that mean we have a few high influential individuals that have a lot of followers).
We can see that the in-degrees and out-degrees do not follow the same curve. This means that we can't expect to have as many connections coming in than coming out. And this interpretation makes sense as our network was build from a social media platform : we tend to follow more people than we have followers. On the other hand, the blue curve (and the out degree distributions in general) is pretty chaotic, we can't tell if it follows any particular distribution.
The graph is way too big to be drawn, so we decide to make a smaller graph. The idea is to keep the node with a degree greater than 3 only. We have a lot of isolated accounts that are quite useless in this centrality analysis. The new graph had now 2785 nodes and and 12228 edges.
As we were exepecting, it seems like we have a few central accounts and thousands of smaller ones gravitating aroung them. To learn more about these central accounts we will now go through some centrality measures. We perform these algorithms on the reduced network as these measures are very costly. Betweeness centrality measures how central a node is. Users with a high betweenness centrality may have a good followers/following ratio. We will look into these accounts later on.
Betweeness : The 5 most central accounts are :
Eigen centrality : The 5 most central accounts are :
In degree centrality : The 5 most central accounts are :
These names are familiar but...
out degree centrality : The 5 most central accounts are :
... these ones aren't. And once again, that makes sense, famous users will have more followers, but as an average person, you can follow as many people as you want !
The results we have on centrality measures let us think that we have quite a good network. We have famous people with a lot of followers and Twitter addicts with a lot of "friends". It is also interesting to note that the betweeness centrality and the eigenvector centrality do not provide the exact same results. It will be interesting to see if we come across these names again when dealing with communities (and we'll see who these people actually are !).
To perform community detection we'll use the Louvain algorithm. The inspiration for this method of community detection is the optimization of Modularity as the algorithm progresses. Modularity is one measure between -1 and 1 of the structure of networks. It was designed to measure the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. (Wikipédia)
We got a louvain modularity of 0.4, we can say this modularity is pretty high, which means we have good partitions in our network.
We have 7 communities and we can see once more that one group is completely disconnected from the others. That is probably due to the first graph reduction (from 64k nodes to 2k nodes). This accounts had a lot of followers but they weren't popular so they've been removed. Now we have this isolated community while the others are connected.
Let's dig into these communities.
Alright, we have a lot of informations to process here, so we'll work community by community. Next to account's name we provide the Twitter descriptions and/or a general description.
According to the in degree
And then by out degree are :
Looking at the results we have and the twitter account's descriptions we can spot two main ideas : independant and left. This community is made of people who claim to be independant, unbiased, left winged and against politicians such as Donald J. Trump. It also seems that these users (the real person behind) are quite educated (PhD, Exec Dir, BUsiness, Bill Gates, Wikileaks...)
The top ten accounts by in degree are :
The top ten accounts by out degree
This community is cleary made of Trump supporters and on a larger scale, republicans. We can see a lot of FOX News TV host, a TV channel which strongly supported Trump during the campaign (and still does). We also have different personnality from Trump familly, board, government... When looking at the top accounts by out degree we come accross strongly patriotic users (a lot of Star Spangled Banners and bald eagles on the pages) and Trump supporters. Two things to notice however :
Trump isn't in the top account, which is a odd, and we have at least one "troll" account (Trevor90666770) and we may have some more that would probably biase the sentimentThe accounts by in degree
The accounts by out degree are :
This community is now Democrat. We have a senator, Michelle Obama, Chelsea Clinton and democrat journalists such as CNN/Bloomberg reporters. Once again we have to notice that Hillary and Obama aren't in the top accounts (also uncanny)
The top ten accounts by out degree
It is really difficult to label this community. We have a lot of different users indeed. From TV host, to average users and even the star wars account... It seems this community is a big melting pot and it will be hard to get any trustworthy result, so we won't spend too much time on this one.
The accounts by in degree are :
The accounts by out degree
And here comes the show business community. We have indeed a lot of singer/podcaster and of course their little fandom. We can see this community as the trendy artists followed by young people. These artists claimed to follow Hillary Clinton during the campaign.
This community seems to be focused on business and financial news. We see huge business news papers such as Forbes or the Wall Street Journal. This community is said to be neutral.
This user is unclassifiable. After going through her feed, nothing stands out but random retweet. There's no real need to consider this community.
People love to talk about Trump. Almost more then Trump likes to talk about himself. The words below were found by calculating a frequency distribution of words found in the tweets. The top ten most popular would were plotted and who would have guessed, Trump is number one by a landslide.
This plot may be a bit misleading though, many of the words that appear in that plot were words we used in our queries. What happens we remove all the query words?
I guess it's not suprising that we found the words vote and voted to be common when tweeting about the election.
We then went a little further and calculated the TF-IDF for all the tweets and found a few things. For the uniformed, TF-IDF stands for term frequency - inverse document frequency and it is a measure of how unique a words is within a collection. TF-IDF puts less emphasis on words that appear more frequently. Some of the fun words we found are listed below.
From the TF-IDF we were able to find out that Trump was trending in India for some time as Dubai had the highest TF-IDF value.
Interesting to see Vote Trump trending in #dubai Strong social media presence or trollers @smwidubai
— Anu C (@AryanRoolz) November 9, 2016
When analyzing TF-IDF we would see that Canada was impacted by the election and felt the act of electing trump was heinous. Mexico also felt the impact but still showed some support with Hillary but commonly using the hashtag #imstillwithher. Brazil liked using the word temer and eleicoes. We'll leave the translating as an exercise.
We also took a look at combining all the text from the queries and calculating the TF-IDF for that, we summarized some interesting results below with example values.
#NotMyPresident
[(u'thoughts', 2.117053452658006), (u'stump', 2.0320841499224453), (u'claimed', 2.0320841499224453), (u'crack', 1.9735016076082625), (u'remarkable', 1.928740016689785), (u'imstillwithher', 1.908685187147373), (u'xenophobic', 1.811216782246648), (u'france', 1.8102713709277207), (u'pres', 1.7570049526821885), (u'trump', 1.7104825170104658)]
stock market OR financial OR obama
[(u'attack', 1.908685187147373), (u'blockchain', 1.8403819524871865), (u'dropping', 1.8212408475281265), (u'effect', 1.8173602329889513), (u'electionday', 1.8062030586117366), (u'uncertainty', 1.779080775844801), (u'chief', 1.765251456518164), (u'early', 1.7392008281822666), (u'virginia', 1.7304288347004635), (u'asia', 1.7169731325164517)]
#ElectionFinalThoughts
[(u'instantwingame', 2.684716305898907), (u'strategist', 2.5208883816572105), (u'kremlin', 2.226215397043603), (u'australia', 2.2052952115306517), (u'warns', 2.1921597050143062), (u'choosewisely', 2.154749103014653), (u'rant', 2.154749103014653), (u'former', 2.1512193281281813), (u'cases', 2.141413198625467), (u'debate', 2.1259473210973487)]
Suprisingly, everyone felt a little meh. Sentiment analysis takes a look at the words in the tweet and give it a happiness score. The expected results for this election? Negative. The results? Neutral. Below are some statistics from the analysis.
The average sentiment of tweet sentiment is 5.57
The stanard deviation of tweet sentiment is 0.45
Percentage of tweets that have a sentiment 86.68%
Percentage of words that have a sentiment 23.92%
The happiest tweet is 7.53
Happiest tweet text: @realDonaldTrump Congratulations For Great victory All Indian with you #electionday
The saddest tweet is 2.62
Saddest tweet text: Is this the End of Terrorism ( #Elections2016 ) and Corruption? ( #ModiFightsCorruption )
Another thing to note is that only 24% of words have sentiment. This is mainly due to the fact that tweets have such a small amount of characters and mainly consist of stopwords. Another downfall of sentiment analysis is that in cannot pick up on the context of the tweet. For example the saddest tweet contains the word terrorism and corruption but the tweet does not appear to be sad and almost optimistic by asking "is this the end of".
We can now measure the sentiment of tweets as it changes over time. These plots all called hedonometers. For the hedonometer we need to find the dates for all the tweets, we create a dictionary and save it to a file. What's particulary strange is how little flucuation there is between November 8th and 10th, the days just following the election. This may be a result of having a higher number tweets during this time period.
Next we zoom in a little bit on the election night. It looks like people can't make up their mind as the sentiment is flucuating all over the place!
words)", "xaxis": {"title": "Word"}, "yaxis": {"title": "Occurence"}}, {"linkText": "Export to plot.ly", "showLink": true})We will now calculate the average sentiment of communities found within the graph. First we need to use a new graph that contains only user that we have tweets for. Review the subsection Creating Network With Users that have Tweets in the Network section found in the explainer notebook. Below is a brief list of notable users found in the communities.
As we can see even among the republican community, community 2, the users are not overly excited. This can be due to multiple factors, for example the character limit of the tweets being set at 140 or tweets about the election generally contain very neutral words. One possiblitly when looking at the general sentiment is the very happy posts are neutralized by the very sad posts leaving us with a neutral result!
Even though the sentiment analysis as it is didn't prove to give interesting insights, we still've got one last trick under our hat: what about analyzing the sentiment through the emojis in the tweets ?
Below, we can see the distribution of emojis in our dataset. Please note that emojis are displayed based on the internal engine of your browser, and identical emoji codes can have totally different emoji mapped to them.
Now that we understand how the emojis are distributed, we can try to build an hedonometer based on an arbitrary scale of values given to emojis. For more details on that scale, please refer to the technical notebook.
This is particularly interesting, as we see that the face with a tear of joy represents the average sentiment before the previous week of the election day. Did people think all of that was a complete joke ?
On November 5 at 1pm, we can witness a sudden drop that is represented by the broken heart emoji, probably correlated to the announce of ISIS to attack voters on election day. And we can also note that only 10 hours before that, the average sentiment was at the heart emoji (the maxium), which also means we need to be careful as the data we have is a fisheye view over the elections. We can see that the sentiment varies a lot during the period of the elections, between November 6 and November 13.
Thanks for staying with us all the way down. If you enjoyed it, don't forget you can go into more details and find out about how we conducted our analysis by checking out the notebook we made.