Data Background

Happiest States Data

Data from WalletHub’s study on 2017’s Happiest States in America [3], was leveraged to obtain the true happiness level of the United States population by state. This information provides a necessary baseline level of happiness to compare the Twitter sentiment analysis against.

“To determine the happiest states in America, WalletHub’s analysts compared the 50 states across three key dimensions: 1) Emotional & Physical Well-Being, 2) Work Environment and 3) Community & Environment … evaluated … using 28 relevant metrics… Each metric was graded on a 100-point scale, with a score of 100 representing maximum happiness… [Analysts] determined each state’s weighted average across all metrics to calculate its total score and used the resulting scores to rank-order our sample.” [3] The specific attributes for this data are shown in the table below.

Twitter Data

To measure the sentiment of the United States population, specifically focusing on happiness level, Twitter data was gathered. The specific attributes for this data are shown in the table below.

Data Collection Procedures

Automated scripts were developed using Python3 to collect the Happiest States and Twitter Data from their respective sites on the website and pull into a CSV file for use in future analysis. The Python Code used in for this can be found in the Reference Materials Section of this site.

After pulling the Happiest States and Twitter Data, a total of 28 attributes were collected in the raw data between the two datasets, including 6 attributes in Happiest States Data, 22 attributes in the Twitter Data. Prior to cleaning, the total number of rows of between all datasets was roughly 87,000 rows.

 

To download a copy of the raw and cleaned datasets, refer to the Reference Materials page.