Friday, October 25, 2013

Clever Ad...

Sorry it's hard to resist an opportunity to post that clip.

Today I saw one of the best advertising techniques I've seen all year.

Whats the big deal?

I couldn't give a single shit about the insurance, because in reality, its not that exciting. A funny idea an all, but whatever.

Now look at this picture:

Did you see it? 

That's how you claim the freebie! By posting on Facebook. 

This simple warranty policy is a very elegant advertising mechanism for a company like Betabrand who seems to be directly targeting social media advertising.

So, Betabrand, kudos to you!

Tuesday, October 22, 2013

Noncommutative Webforms

A story:

Today I was filling out a webform. It asked for some address information.

Pretty normal. 

First it asks for state, with a blank text field. I type Kansas.

Still fine.

Then it asks for zip, with a blank text field. I type mine.

First alert goes off.

Then it asks for country, with a drop down. I select USA. It changes the state field to a drop down, erasing my previous entry!

Second alert goes off. Third alert goes off.

This webform, is for a job, doing NLP for a major tech company.

At this point, I nearly stopped applying.

Why does this matter?

The first paper I read when I started getting interested in data-science was by DJ Patil. I'll admit, I was skeptical as hell, and I don't mean in the haute Cathy O'neil way.

One of my favorite points from his paper, and what convinced me that smart people were doing data science was his little observation about building smarter webforms for data entry.

So, this company isn't even using the basic advice from one of the most popular data science papers of all time?


This is a perfect example of noncommutativity. It is NOT the same to:

  • ask first for a state, then ask for the country which changes the entry field for the state
  • ask first for the country which changes the entry field for the state, then ask for the state
So which is better?

The astute reader(or even beginner web programmer) will say... zip code.

Friday, October 11, 2013

Evolutive Sentiment Analysis

How to go beyond AFINN for highly specialized lexicons.

If you've heard of sentiment analysis, you've heard of AFINN...

For game like MtG, the AFINN model, frankly speaking, will probably suck. To illustrate, let me provide an example:

Notice here that fattiest and fatty are just referring to the power/toughness, and are positive characteristics. Etc...

In niche areas, the AFINN model is going to be especially weak for lexicons with lots of buzzwords. Companies and organizations working in niche areas start their analysis by building stronger models than AFINN.

For MtG, how can we build a better model? Using sentiment analysis actually...

A training set

If you need to build a more refined model, you need to incorporate words in this niche into your corpus.

For our project, we use 50000 tweets about MtG. We gather these from the hashtags, #MtG, #Theros, #PTTheros, #PTTHS. And now we use AFINN on these. Wait. What? I thought that AFINN was bad?!

AFINN provides a first approximation to our niche words. With 50000 tweets, we can look at words that appear more than a few times, and add their sentiment scores(build using AFINN avg sentiment) to our model. We call this model AFINN-MTG to reflect its bias towards MTG lingo.

Note: It is important to remove any Theros card names from this model to not accidentally skew our later results. We could do this just by removing these words, or by removing tweets from the training set which contain these card names.

Test set

The next step in any machine learning application is to run on a test set. Here is where we will use our card name tweets. This set will be run with the new model to build our first primitive sentiment scores.

Doing better

What about sarcasm? Or negative words? These are common issues for sentiment analysis, and we'll talk about those later, but one additional improvement is to look at word bigrams. We'll see next time what kind of word bigrams appear and are noteworthy.