Association Analysis

Association Rules and Frequent Itemset Mining Analysis

Association rules were mined from the tweet content of the Twitter Data, using the Apriori Algorithm, to determine which words occurred most frequently together. We began by identifying the rules that met the criteria of MinSup of 0.0025 and MinConf of 0.70,

The criteria was then adjusted to identify association rules for words with a MinSup of 0.002 and MinConf of 0.7 to identify frequent itemsets with confidence >= 0.7 for all rules {wind, pressure, humidity, current, weather}. The new association rules were as follows,

When MinSup was adjusted to 0.004 and MinConf to 0.7, one additional association rule was identified,

 

Based on these results, it is apparent that certain phrases like “happy birthday”, “New York”, and “Puerto Rico” are frequent in tweet content. The frequency of the words “Wind” and “Humidity” together was more surprising to researchers, but can potentially be explained by the severity of the hurricane season occurring during the time frame that the tweets were pulled. Lastly, the addition of association rule “got” and “ta” is the most surprising to researchers, but may be the result of the colloquial term “gotta”.