How can we design a data collection algorithm which can search for misspellings of made up words?
When querying Twitter for tweets containing names of new cards, we need also consider what SEO people refer to as "generated misspellings"(in the context of SEO, these are used to generate web keywords to artificially increase their pagerank when searchers misspell their search.)
But what does this have to do with medicine...
Just like using a corpus of tweets, these medical documents lack a comprehensive vocabulary, so a standard spell corrector is inappropriate for this task. Simply, we don't want to download all tweets and spellcorrect them, we want to search for tweets that use one of our generated erroneous terms.
Ways to err
Here are three obvious misspellings(Approximate string matching);
1) insertion i.e. Bolivia Voldaren
2) deletion i.e. Counterfux
3) substitution i.e. Knightly Vapor
and algorithmically, we see how to generate each of these from our starting phrase.
One additional misspelling comes from bi-gram transposition, i.e. Angle of Despair
These four methods should generate a rich library of mispellings for a single magic card name, but to what degree should we consider double mistakes? More on this later.