How to go beyond AFINN for highly specialized lexicons.
If you've heard of sentiment analysis, you've heard of AFINN...
For game like MtG, the AFINN model, frankly speaking, will probably suck. To illustrate, let me provide an example:
Notice here that fattiest and fatty are just referring to the power/toughness, and are positive characteristics. Etc...
In niche areas, the AFINN model is going to be especially weak for lexicons with lots of buzzwords. Companies and organizations working in niche areas start their analysis by building stronger models than AFINN.
For MtG, how can we build a better model? Using sentiment analysis actually...
A training set
If you need to build a more refined model, you need to incorporate words in this niche into your corpus.
For our project, we use 50000 tweets about MtG. We gather these from the hashtags, #MtG, #Theros, #PTTheros, #PTTHS. And now we use AFINN on these. Wait. What? I thought that AFINN was bad?!
AFINN provides a first approximation to our niche words. With 50000 tweets, we can look at words that appear more than a few times, and add their sentiment scores(build using AFINN avg sentiment) to our model. We call this model AFINN-MTG to reflect its bias towards MTG lingo.
Note: It is important to remove any Theros card names from this model to not accidentally skew our later results. We could do this just by removing these words, or by removing tweets from the training set which contain these card names.
The next step in any machine learning application is to run on a test set. Here is where we will use our card name tweets. This set will be run with the new model to build our first primitive sentiment scores.
What about sarcasm? Or negative words? These are common issues for sentiment analysis, and we'll talk about those later, but one additional improvement is to look at word bigrams. We'll see next time what kind of word bigrams appear and are noteworthy.