TWEET ANALYSIS & SNP500

My original plan for retrieving data was to pull Mr. Donald Trump’s tweets using Twitter API and check stockmarket and FX volatilities given the words displayed on Mr. Donald Trump’s official Twitter account. Despite my efforts to connect to Twitter API and pull historical data, due to Twitter’s privacy rules, I wasn’t able to obtain such historical data. I then tried to scrape a web page,FactBa.se, using the scroller code provided in test.py, however, that also was not very successful since I got detected after every 1000th iteration, which is when the website would stop loading new content, causing an infinite loop in my computer. Finally, I managed to find raw data for Trump’s Tweets on a web page. For SNP500 closing prices, I used Yahoo API, which is accessible by anyone willing to look under the rock!

This analysis involves:

- Collecting tweets

 -Cleaning the data

- Categorizing each sliced word using Excel

- Putting this data together to obtain the continuous frequency of each category's appearance on Mr. Donald Trump's profile, including the retweets and quotes, over a 21-day timeframe

- Visually comparing this to the continuous change of SNP500 closing prices by pulling the data from Yahoo Finance API

- Univariate analysis of each variable in our large, clean dataset

- Multivariate analysis.

 

The results are as displayed on the HTML page linked below. 

This exploratory data analysis report  was created to satisfy the requirements of my Data Science class STAT 301-1: Data Science 1

Turn-in Date: December 2020

Data Sources: Yahoo! Finance, Twitter

Language: R, Python( for scraping scroll algorithm)

Heading 1