The 2020 US Presidential Election was one of the most closely watched and contentious elections in recent history, and the development of mass media has greatly influenced the way people obtain information and express their emotions. This study aims to explore the relationship between Twitter sentiment and election results during the 2020 US Presidential Election and examine how changes in media consumption and expression affect political attitudes and beliefs.
The two datasets used in this study are tweets posted during October 15th to November 4th, 2020, which have approximately 15,000 tweets for Biden and 36,000 tweets for Trump.
<- read.csv('/Users/cathy/Documents/Columbia Sem 2/5205_R Framework/Final Project/Final/trump.csv')
trump_df <- tibble::rowid_to_column(trump_df, "id")
trump_df <- read.csv('/Users/cathy/Documents/Columbia Sem 2/5205_R Framework/Final Project/Final/biden.csv')
biden_df <- tibble::rowid_to_column(biden_df, "id") biden_df
str(trump_df)
## 'data.frame': 36554 obs. of 7 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ created_at: chr "2020-10-15 00:00:08" "2020-10-15 00:00:26" "2020-10-15 00:01:14" "2020-10-15 00:01:30" ...
## $ tweet : chr "You get a tie! And you get a tie! #Trump ‘s rally #Iowa https://t.co/jJalUUmh5D" "#Trump #PresidentTrump #Trump2020LandslideVictory #Trump2020 #MAGA #KAG #4MoreYears #America #AmericaFirst #All"| __truncated__ "#Trump: Nobody likes to tell you this, but some of the farmers were doing better the way I was doing it than th"| __truncated__ "@karatblood @KazePlays_JC Grab @realDonaldTrump by the balls & chuck the bastard out the door onto #Pennsyl"| __truncated__ ...
## $ tweetNew : chr "['get', 'tie', 'get', 'tie', 'trump', 'rally', 'iowa']" "['trump', 'presidenttrump', 'trump', 'landslidevictory', 'trump', 'maga', 'kag', 'moreyears', 'america', 'ameri"| __truncated__ "['trump', 'nobody', 'likes', 'tell', 'farmers', 'better', 'way', 'working', 'asses', 'check', 'totally', 'mail', 'right']" "['karatblood', 'kazeplays', 'jc', 'grab', 'realdonaldtrump', 'balls', 'amp', 'chuck', 'bastard', 'door', 'onto'"| __truncated__ ...
## $ month : int 10 10 10 10 10 10 10 10 10 10 ...
## $ day : int 15 15 15 15 15 15 15 15 15 15 ...
## $ year : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
str(biden_df)
## 'data.frame': 15157 obs. of 7 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ created_at: chr "10/15/20 0:01" "10/15/20 0:01" "10/15/20 0:03" "10/15/20 0:05" ...
## $ tweet : chr "Comments on this? \"Do Democrats Understand how Ruthless China is?\" https://t.co/QevK00yhs3 #China #HunterBide"| __truncated__ "@RealJamesWoods #BidenCrimeFamily #JoeBiden #HunterBiden #HunterBidenEmails https://t.co/ottX1yP37j" "@realDonaldTrump #TrumpIsALaughingStock @realDonaldTrump at his Iowa cult rally compared #JoeBiden to Putin, XI"| __truncated__ "Laptop computer abandoned at Delaware repair shop contains #emails between #HunterBiden & senior #Burisma #"| __truncated__ ...
## $ tweetNew : chr "['comments', 'democrats', 'understand', 'ruthless', 'china', 'china', 'hunterbiden', 'joebiden', 'bidenharris',"| __truncated__ "['realjameswoods', 'bidencrimefamily', 'joebiden', 'hunterbiden', 'hunterbidenemails']" "['realdonaldtrump', 'trumpisalaughingstock', 'realdonaldtrump', 'iowa', 'cult', 'rally', 'compared', 'joebiden'"| __truncated__ "['laptop', 'computer', 'abandoned', 'delaware', 'repair', 'shop', 'contains', 'emails', 'hunterbiden', 'amp', '"| __truncated__ ...
## $ month : int 10 10 10 10 10 10 10 10 10 10 ...
## $ day : int 15 15 15 15 15 15 15 15 15 15 ...
## $ year : int 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
Let see which words are used most frequently in these reviews.
To do this, we will employ the tidytext library which uses a tidy data approach. Each review is tokenized into words and then pivoted to a tall format. dplyr functions are used to summarize, sort, and filter the top 25. Also, we need to remove stop words to. This is accomplished through anti_join with a list of stop words, tidytext::stop_words.
library(tidytext)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
%>%
trump_df unnest_tokens(input = tweetNew, output = word) %>%
select(word) %>%
anti_join(stop_words)%>%
group_by(word) %>%
summarize(count = n()) %>%
ungroup() %>%
arrange(desc(count)) %>%
top_n(25)
## Joining, by = "word"
## Selecting by count
## # A tibble: 25 × 2
## word count
## <chr> <int>
## 1 trump 43989
## 2 election 6458
## 3 amp 5560
## 4 donaldtrump 3770
## 5 realdonaldtrump 3467
## 6 covid 3201
## 7 biden 3005
## 8 vote 2877
## 9 joebiden 2377
## 10 maga 2304
## # … with 15 more rows
%>%
biden_df unnest_tokens(input = tweetNew, output = word) %>%
select(word) %>%
anti_join(stop_words)%>%
group_by(word) %>%
summarize(count = n()) %>%
ungroup() %>%
arrange(desc(count)) %>%
top_n(25)
## Joining, by = "word"
## Selecting by count
## # A tibble: 25 × 2
## word count
## <chr> <int>
## 1 joebiden 12450
## 2 biden 9034
## 3 bidenharris 4105
## 4 amp 2220
## 5 trump 2161
## 6 vote 2114
## 7 election 1938
## 8 joe 1556
## 9 debates 1468
## 10 hunterbiden 982
## # … with 15 more rows
One of the simplest approaches to natural language processing is to categorize words based on their meaning. Words may be categorized based on their valence (positive or negative), emotion (e.g., happy, sad). This can be done conveniently using a relevant lexicon.
We will begin by examining lexicons that classify tokens into two categories based on valence, usually as positive or negative.Assign a score between -5 and +5 to each word based on the extent to which it is positive or negative.
=read.csv('/Users/cathy/Documents/Columbia Sem 2/5205_R Framework/Final Project/Final/afinn.csv')
afinn as.data.frame(get_sentiments('afinn'))[1:20,]
## word value
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## 11 abilities 2
## 12 ability 2
## 13 aboard 1
## 14 absentee -1
## 15 absentees -1
## 16 absolve 2
## 17 absolved 2
## 18 absolves 2
## 19 absolving 2
## 20 absorbed 1
We match the words in the dictionary with the ones in the reviews to determine sentiment score.
<- trump_df %>%
trump_df_1 select(id, tweetNew) %>%
group_by(id) %>%
unnest_tokens(output = word, input = tweetNew) %>%
inner_join(afinn) %>%
summarize(reviewSentiment = mean(value)) %>%
ungroup()
## Joining, by = "word"
<- trump_df_1 %>%
trump_sentiment inner_join(trump_df)
## Joining, by = "id"
We visualize distribution of the sentiment score.
library(ggplot2)
%>%
trump_sentiment select(id,tweetNew)%>%
group_by(id)%>%
unnest_tokens(output=word,input=tweetNew)%>%
inner_join(afinn)%>%
summarize(reviewSentiment = mean(value))%>%
ungroup()%>%
ggplot(aes(x=reviewSentiment,fill=reviewSentiment>0))+
geom_histogram(binwidth = 0.1)+
scale_x_continuous(breaks=seq(-5,5,1))+
scale_fill_manual(values=c('tomato','seagreen'))+
guides(fill=F)
## Joining, by = "word"
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
We match the words in the dictionary with the ones in the reviews to determine sentiment score.
<- biden_df %>%
biden_df_1 select(id, tweetNew) %>%
group_by(id) %>%
unnest_tokens(output = word, input = tweetNew) %>%
inner_join(afinn) %>%
summarize(reviewSentiment = mean(value)) %>%
ungroup()
## Joining, by = "word"
<- biden_df_1 %>%
biden_sentiment inner_join(biden_df)
## Joining, by = "id"
We visualize distribution of the sentiment score.
%>%
biden_sentiment select(id,tweetNew)%>%
group_by(id)%>%
unnest_tokens(output=word,input=tweetNew)%>%
inner_join(afinn)%>%
summarize(reviewSentiment = mean(value))%>%
ungroup()%>%
ggplot(aes(x=reviewSentiment,fill=reviewSentiment>0))+
geom_histogram(binwidth = 0.1)+
scale_x_continuous(breaks=seq(-5,5,1))+
scale_fill_manual(values=c('tomato','seagreen'))+
guides(fill=F)+
ylim(0,3000)
## Joining, by = "word"
Tweets about Biden and the election were slightly more positive than negative, while tweets about Trump and the election are relatively polarized with a commensurate distribution on both the positive and negative scale
We will begin by examining lexicons that classify tokens into two categories based on valence, usually as positive or negative.
The “bing” lexicon categorizes words as being positive or negative. The lexicon is included with the tidytext library and can be accessed by calling get_sentiments(‘bing’). Here are the first twenty words.
=read.csv('/Users/cathy/Documents/Columbia Sem 2/5205_R Framework/Final Project/Final/bing.csv')
bing as.data.frame(get_sentiments('bing'))[1:20,]
## word sentiment
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## 11 abound positive
## 12 abounds positive
## 13 abrade negative
## 14 abrasive negative
## 15 abrupt negative
## 16 abruptly negative
## 17 abscond negative
## 18 absence negative
## 19 absent-minded negative
## 20 absentee negative
We match the words in the dictionary with the ones in the reviews to determine valence.
%>%
trump_sentimentgroup_by(id)%>%
unnest_tokens(output = word, input = tweetNew)%>%
inner_join(get_sentiments('bing'))%>%
group_by(sentiment)
## Joining, by = "word"
## # A tibble: 77,675 × 3
## # Groups: sentiment [2]
## id word sentiment
## <int> <chr> <chr>
## 1 2 trump positive
## 2 2 trump positive
## 3 2 trump positive
## 4 2 winning positive
## 5 3 trump positive
## 6 3 likes positive
## 7 3 better positive
## 8 3 right positive
## 9 4 bastard negative
## 10 5 trump positive
## # … with 77,665 more rows
head(trump_sentiment)
## # A tibble: 6 × 8
## id reviewSentiment created_at tweet tweet…¹ month day year
## <int> <dbl> <chr> <chr> <chr> <int> <int> <int>
## 1 2 4 2020-10-15 00:00:26 "#Trump #… ['trum… 10 15 2020
## 2 3 2 2020-10-15 00:01:14 "#Trump: … ['trum… 10 15 2020
## 3 4 -2.33 2020-10-15 00:01:30 "@karatbl… ['kara… 10 15 2020
## 4 5 -0.714 2020-10-15 00:01:53 "#TheWeek… ['thew… 10 15 2020
## 5 6 -3 2020-10-15 00:02:14 "I have l… ['lost… 10 15 2020
## 6 8 1 2020-10-15 00:03:32 "#Trump: … ['trum… 10 15 2020
## # … with abbreviated variable name ¹tweetNew
We visualize distribution of the sentiment score.
%>%
trump_sentimentgroup_by(id)%>%
unnest_tokens(output = word, input = tweetNew)%>%
inner_join(get_sentiments('bing'))%>%
group_by(sentiment)%>%
count()%>%
ggplot(aes(x=sentiment,y=n,fill=sentiment))+
geom_col()+
guides(fill=F)+
coord_flip()
## Joining, by = "word"
%>%
biden_sentimentgroup_by(id)%>%
unnest_tokens(output = word, input = tweetNew)%>%
inner_join(get_sentiments('bing'))%>%
group_by(sentiment)
## Joining, by = "word"
## # A tibble: 15,202 × 3
## # Groups: sentiment [2]
## id word sentiment
## <int> <chr> <chr>
## 1 3 sharp positive
## 2 6 misleading negative
## 3 8 threatening negative
## 4 10 right positive
## 5 10 benefit positive
## 6 11 worry negative
## 7 16 scandal negative
## 8 16 lying negative
## 9 18 supported positive
## 10 18 upset negative
## # … with 15,192 more rows
%>%
biden_sentimentgroup_by(id)%>%
unnest_tokens(output = word, input = tweetNew)%>%
inner_join(get_sentiments('bing'))%>%
group_by(sentiment)%>%
count()%>%
ggplot(aes(x=sentiment,y=n,fill=sentiment))+
geom_col()+
guides(fill=F)+
coord_flip()+
ylim(0, 45000)
## Joining, by = "word"
The result depicted the disparity in the volume of tweets on Biden and on Trump. Tweets on Biden have relatively equal counts of positive and negative tweets, adding up to a total of nearly 20,000 tweets. While tweets on Trump and the election are greater in volume, with positive tweets surpassing the count of negative tweets by over nearly 20,000.
In order to examine the trend of tweet sentiment leading up to the election, we plotted the line graph to inspect the trend of positive and negative tweet counts throughout the period of October 15th to November 3rd in 2020.
#Plot line chart Positive/ Negative sentiment of trump daily
<- trump_sentiment %>%
trump_sentiment_daily mutate(date = as.Date(paste(year, month, day, sep = "-"))) %>%
unnest_tokens(output = word, input = tweetNew) %>%
inner_join(get_sentiments('bing')) %>%
group_by(date, sentiment) %>%
count() %>%
ggplot(aes(x = date, y = n, color = sentiment)) +
geom_line(linewidth = 1.1) + # Set line thickness to 1.5
scale_color_manual(values = c("positive" = "#74C476", "negative" = "#F15854")) +
labs(x = "Date", y = "Count", title = "Positive and Negative Sentiments of Trump by Date") +
theme_minimal() +
theme(text = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 14))
## Joining, by = "word"
trump_sentiment_daily
#Plot line chart Positive/ Negative sentiment of Biden daily
<- biden_sentiment %>%
biden_sentiment_daily mutate(date = as.Date(paste(year, month, day, sep = "-"))) %>%
unnest_tokens(output = word, input = tweetNew) %>%
inner_join(get_sentiments('bing')) %>%
group_by(date, sentiment) %>%
count() %>%
ggplot(aes(x = date, y = n, color = sentiment)) +
geom_line(size = 1.1) + # Set line thickness to 1.5
scale_color_manual(values = c("positive" = "#74C476", "negative" = "#F15854")) +
labs(x = "Date", y = "Count", title = "Positive and Negative Sentiments of Biden by Date") +
theme_minimal() +
ylim(0, 4000)+
theme(text = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 14))
## Joining, by = "word"
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
biden_sentiment_daily
We noticed two abrupt spikes in tweet counts for both Biden and Trump on October 15th and October 22nd. These spikes corresponded to the presidential debate, which was originally scheduled on October 15th, but was postponed to October 22nd due to health safety concerns during the pandemic. The fluctuations of positive and negative tweets were dramatic and high in count for Trump, while tweets on Biden are generally low in count and showed slight fluctuations in trend. Furthermore, tweets about Trump are consistently more positive than negative throughout the entire two weeks period leading up to the election. This was rather unexpected, indicating that higher positive tweet sentiments did not align with the final election result. As the campaign neared the election day November 3rd, we could observe an increase on both graphs, demonstrating intensifying sentiments and increasing engagement on both sides of the race.
The NRC Emotion Lexicon is a list of 5,636 English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).
=read.csv("/Users/cathy/Documents/Columbia Sem 2/5205_R Framework/Final Project/Final/nrc.csv")
nrc head(nrc)
## word sentiment
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
%>%
trump_sentiment group_by(id) %>%
unnest_tokens(output = word, input = tweetNew) %>%
left_join(nrc, by = "word") %>%
group_by(sentiment) %>%
count() %>%
filter(!is.na(sentiment)) %>% # Filter out records with no sentiment match in nrc dataset
arrange(desc(n))
## # A tibble: 10 × 2
## # Groups: sentiment [10]
## sentiment n
## <chr> <int>
## 1 surprise 35973
## 2 negative 31595
## 3 positive 28571
## 4 trust 21117
## 5 fear 16903
## 6 sadness 16328
## 7 anger 16113
## 8 anticipation 15507
## 9 joy 11544
## 10 disgust 10849
#graph it
%>%
trump_sentimentgroup_by(id)%>%
unnest_tokens(output = word, input = tweetNew)%>%
left_join(nrc, by = "word") %>%
group_by(sentiment)%>%
count()%>%
filter(!is.na(sentiment)) %>%
ggplot(aes(x=reorder(sentiment,X = n), y=n, fill=sentiment))+
geom_col()+
guides(fill=F)+
coord_flip()
%>%
biden_sentiment group_by(id) %>%
unnest_tokens(output = word, input = tweetNew) %>%
left_join(nrc, by = "word") %>%
group_by(sentiment) %>%
count() %>%
filter(!is.na(sentiment)) %>% # Filter out records with no sentiment match in nrc dataset
arrange(desc(n))
## # A tibble: 10 × 2
## # Groups: sentiment [10]
## sentiment n
## <chr> <int>
## 1 positive 10726
## 2 negative 8867
## 3 trust 7914
## 4 anticipation 6020
## 5 sadness 5000
## 6 joy 4828
## 7 fear 4785
## 8 surprise 4719
## 9 anger 4621
## 10 disgust 2408
#graph it
%>%
biden_sentimentgroup_by(id)%>%
unnest_tokens(output = word, input = tweetNew)%>%
left_join(nrc, by = "word") %>%
group_by(sentiment)%>%
count()%>%
filter(!is.na(sentiment)) %>%
ggplot(aes(x=reorder(sentiment,X = n), y=n, fill=sentiment))+
geom_col()+
guides(fill=F)+
coord_flip()+
ylim(0,30000)
The NRC lexicon categorizes tweets into various emotions. To analyze the tweet sentiments with more nuance, we applied the NRC lexicon to explore a variety of sentiments instead of binary positive and negative sentiments. This graph displayed the overall count of each categorized tweet. Tweets about Biden mostly expressed positive sentiment with a total of over 10,000 tweets, followed by negative then trust. On the other hand, tweets about Trump revealed that surprise was the most dominant sentiment with a total of over 35,000 counts, followed by negative tweets as the second most frequent sentiment and positive tweets the third. Once again, we observed a significantly larger amount of tweet volume for Trump compared to Biden.
To further inspect the NRC sentiments on tweets, we decided to plot the average count of each sentiment by weeks from 10/15 to 10/22 and 10/23 to 11/2. We also isolated the election day, 11/3, as its own unit to focus on the count of tweets categorized in each sentiment.
# Create a function to extract the date from month and day columns
<- function(month, day) {
get_date as.Date(paste("2020", month, day, sep = "-"))
}
# Use mutate to create a new column with date
<- trump_sentiment %>%
trump_sentiment mutate(date = get_date(month, day))
head(trump_sentiment)
## # A tibble: 6 × 9
## id reviewSentiment created_at tweet tweet…¹ month day year date
## <int> <dbl> <chr> <chr> <chr> <int> <int> <int> <date>
## 1 2 4 2020-10-15 0… "#Tr… ['trum… 10 15 2020 2020-10-15
## 2 3 2 2020-10-15 0… "#Tr… ['trum… 10 15 2020 2020-10-15
## 3 4 -2.33 2020-10-15 0… "@ka… ['kara… 10 15 2020 2020-10-15
## 4 5 -0.714 2020-10-15 0… "#Th… ['thew… 10 15 2020 2020-10-15
## 5 6 -3 2020-10-15 0… "I h… ['lost… 10 15 2020 2020-10-15
## 6 8 1 2020-10-15 0… "#Tr… ['trum… 10 15 2020 2020-10-15
## # … with abbreviated variable name ¹tweetNew
# Use case_when to create a new column with period
<- trump_sentiment %>%
trump_sentiment mutate(period = case_when(
>= as.Date("2020-10-15") & date <= as.Date("2020-10-22") ~ "10/15-10/22",
date >= as.Date("2020-10-23") & date <= as.Date("2020-11-02") ~ "10/23-11/2",
date == as.Date("2020-11-03") ~ "11/3"
date
))head(trump_sentiment)
## # A tibble: 6 × 10
## id reviewSentim…¹ creat…² tweet tweet…³ month day year date period
## <int> <dbl> <chr> <chr> <chr> <int> <int> <int> <date> <chr>
## 1 2 4 2020-1… "#Tr… ['trum… 10 15 2020 2020-10-15 10/15…
## 2 3 2 2020-1… "#Tr… ['trum… 10 15 2020 2020-10-15 10/15…
## 3 4 -2.33 2020-1… "@ka… ['kara… 10 15 2020 2020-10-15 10/15…
## 4 5 -0.714 2020-1… "#Th… ['thew… 10 15 2020 2020-10-15 10/15…
## 5 6 -3 2020-1… "I h… ['lost… 10 15 2020 2020-10-15 10/15…
## 6 8 1 2020-1… "#Tr… ['trum… 10 15 2020 2020-10-15 10/15…
## # … with abbreviated variable names ¹reviewSentiment, ²created_at, ³tweetNew
# Group by period and sentiment
<- trump_sentiment %>%
sentiment_by_period_trump unnest_tokens(output = word, input = tweetNew) %>%
left_join(nrc, by = "word") %>%
group_by(period, sentiment) %>%
summarize(total_count = n(),
distinct_dates = n_distinct(date),
.groups = "drop") %>%
mutate(avg_count = if_else(distinct_dates > 1, as.double(total_count) / distinct_dates, as.double(total_count))) %>%
mutate(avg_count = round(avg_count)) %>%
filter(!is.na(sentiment), !is.na(period))
head(sentiment_by_period_trump)
## # A tibble: 6 × 5
## period sentiment total_count distinct_dates avg_count
## <chr> <chr> <int> <int> <dbl>
## 1 10/15-10/22 anger 5116 8 640
## 2 10/15-10/22 anticipation 4909 8 614
## 3 10/15-10/22 disgust 3783 8 473
## 4 10/15-10/22 fear 5835 8 729
## 5 10/15-10/22 joy 3535 8 442
## 6 10/15-10/22 negative 10789 8 1349
ggplot(sentiment_by_period_trump, aes(x = period, y = avg_count, fill = sentiment)) +
geom_col() +
coord_flip() +
#ylim(0, 6000)
guides(fill = F) +
facet_wrap(~sentiment, scales = "free_y") +
theme_minimal() +
labs(title = "Average count of sentiment by Trump by week")
# Create a function to extract the date from month and day columns
<- function(month, day) {
get_date as.Date(paste("2020", month, day, sep = "-"))
}
# Use mutate to create a new column with date
<- biden_sentiment %>%
biden_sentiment mutate(date = get_date(month, day))
# Use case_when to create a new column with period
<- biden_sentiment %>%
biden_sentiment mutate(period = case_when(
>= as.Date("2020-10-15") & date <= as.Date("2020-10-22") ~ "10/15-10/22",
date >= as.Date("2020-10-23") & date <= as.Date("2020-11-02") ~ "10/23-11/2",
date == as.Date("2020-11-03") ~ "11/3"
date
))
# Group by period and sentiment
<- biden_sentiment %>%
sentiment_by_period_biden unnest_tokens(output = word, input = tweetNew) %>%
left_join(nrc, by = "word") %>%
group_by(period, sentiment) %>%
summarize(total_count = n(),
distinct_dates = n_distinct(date),
.groups = "drop") %>%
mutate(avg_count = if_else(distinct_dates > 1, as.double(total_count) / distinct_dates, as.double(total_count))) %>%
mutate(avg_count = round(avg_count)) %>%
filter(!is.na(sentiment), !is.na(period))
# Create a bar chart with sentiment average count by period
ggplot(sentiment_by_period_biden, aes(x = period, y = avg_count, fill = sentiment)) +
geom_col() +
coord_flip() +
ylim(0, 2500)+
guides(fill = F) +
facet_wrap(~sentiment, scales = "free_y") +
theme_minimal() +
labs(title = "Average count of sentiment by Biden by week")
For tweets about both Biden and Trump, there was an increase in the average count of tweets in each category by week. However, the increase was much more substantial and distinct for tweets about Trump.