Elasticsearch Analyzers

There are many analyzers in #Elasticsearch that you can use for your indices which each have their own use cases. For our purposes, we use an ngram analyzer on the item name search terms and department names.  These analyzers have a lowercase filter to allow for case-insensitive searches and gram ranges from 3 to 10. These values were chosen based on an analysis of the item and department names we had, and of course we encourage you to do your own analysis before picking values that will work best for your needs. For example, our smallest relevant token was a three-letter word like “rum” or a number like “Bacardi 151” while we also had some longer terms. The wider your gram range is, the more memory used when used as an index analyzer , but it does allow for an increase in relevancy score for the most relevant results.

Here is a snippet of how we defined this analyzer:

                      _type: { store: true },
                      settings: {
                        analysis: {
                          tokenizer: {
                            item_name_ngram_tokenizer: {
                              type: "nGram",
                              min_gram: "3",
                              max_gram: "10",
                              token_chars: [ "letter", "digit" ]
                          analyzer: {
                            item_name_ngram_analyzer: {
                              filter: ["lowercase"],
                              tokenizer: "item_name_ngram_tokenizer"

Stay tuned for more ES and other tips and tricks in the future!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s