Partial search in Elasticsearch

A common problem on the Internet, with very few answers, is how to implement partial word search (-gram search in search engine vocabulary) based on Elasticsearch 5.x. A lot of the articles on the Internet are out of date, so here is a short example for the latest Elasticsearch using Ruby, my preferred programming language.

Step 1: Add Elasticsearch Gem

#Gemfile
gem 'elasticsearch-rails'
gem 'elasticsearch-model'

Step 2:  Add an async job (Sidekiq)

#app/workers/elastic_indexer_worker.rb
​​class ElasticsearchIndexWorker
​​  include ::Sidekiq::Worker
​​  sidekiq_options :queue => :indexer
​​
​​  # Need to check for uniqueness
​​  def perform(klass, id)
​​    obj = klass.constantize.find_by(id: id)
​​    action = "update"
​​    unless obj
​​      obj = klass.constantize.new(id: id)
​​      action = "remove"
​​    end
​​
​​    update_index(obj, action)
​​  end
​​
​​
​​  def update_index(obj, action)
​​    if action.to_s == "remove"
​​      obj.__elasticsearch__.delete_document rescue false
​​    else
​​      res = obj.__elasticsearch__.index_document
​​      raise Exception.new(res["error"]) if res["error"]
​​    end
​​  end
​​
​​end

Step 3: Create a concern for the async job

#app/models/concerns/elastic_async_callback.rb
​​module Elasticsearch
​​  module Model
​​    module AsyncCallbacks
​​      extend ActiveSupport::Concern
​​      included do
​​        after_commit   :async_elastic_save_index
​​        after_touch    :async_elastic_save_index
​​        before_destroy :async_elastic_delete_index
​​    
​​        def async_elastic_save_index
​​          async_elastic_callback
​​        end
​​    
​​        def async_elastic_delete_index
​​          async_elastic_callback
​​        end
​​    
​​        def async_elastic_callback
​​          ElasticsearchIndexWorker.perform_async(self.class.to_s, self.id)
​​        end
​​      end
​​    end
​​  end
​​end

Step 4: Integration with Rails models

class AnyModel < ActiveRecord::Base
​​  include Elasticsearch::Model
​​  include Elasticsearch::Model::AsyncCallbacks
​​  
​​  # here goes elasticsearch setting and mapping
​​  settings analysis: {
​​    tokenizer: {
​​        shingles_tokenizer: { # if you have full matching then we'll need higher priority
​​            type: 'whitespace'
​​        },
​​        edge_ngram_tokenizer: {
​​            type: "edgeNGram", # we needed beginnings of the words
​​            min_gram: "2",
​​            max_gram: "20",
​​            token_chars: ["letter","digit"],
​​            filter:   ["lowercase"]
​​        }
​​    },
​​    analyzer: {
​​        shingle_analyzer: {
​​            type:      'custom',
​​            tokenizer: 'shingles_tokenizer',
​​            filter:    ['shingle', 'lowercase', 'asciifolding']
​​        },
​​          edge_ngram_analyzer: {
​​            tokenizer: "edge_ngram_tokenizer",
​​            filter: ["lowercase"]
​​        }
​​    }
​​  } do
​​  mapping do
​​    indexes :published,      type: "boolean", index: :not_analyzed
​​
​​    indexes :name, type: 'text', analyzer: "edge_ngram_analyzer", search_analyzer: 'standard', boost: 120, fields: {
​​        'shingle' => { # shingle will have higher priority
​​            type: 'text',
​​            analyzer: 'shingle_analyzer',
​​            search_analyzer: 'standard',  
​​            boost: 240
​​        },
​​        'raw'     => { # use it if you need original value for any reason
​​          type: 'keyword',
​​          index: :not_analyzed
​​        }
​​    }
​​    indexes :descr, type: 'text', analyzer: "edge_ngram_analyzer", search_analyzer: 'standard', boost: 60, fields: {
​​    'shingle' => {
​​      type: 'text',
​​      analyzer: 'shingle_analyzer',
​​      search_analyzer: 'standard',  
​​      boost: 120
​​    }
​​    ....
​​    end
​​  end
​​end
<strong> Step 5: Let’s use console to create an index for some model</strong>

[cc lang="ruby"]
# console
SomeModel.__elasticsearch__.create_index!(force: true)
SomeModel.find_each{|o| o.__elasticsearch__.index_document}

Step 6: Let’s implement search for a previously indexed model

#app/models/some_model.rb
​​def self.search params, is_admin
​​  order = params[:order].presence
​​  search_string = params[:query]
​​  search_string = nil if search_string.eql?("*")
​​
​​
​​  mapping = SomeModel.__elasticsearch__.mapping.to_hash[:some_model][:properties]
​​  search_fields = mapping.each_with_object([]){|(k,v),a|  a.push(k.to_s, "#{k}.shingle") if v[:fields] }
​​
​​  definition = {
​​      # getting just an id.
​​      # Remove line if want to get all model  from Elasstic
​​      _source: false,
​​      query: {
​​          bool:{
​​              filter: [
​​                  (!is_admin    && {term:{published: true}}).presence,
​​                  (params[:ids] && {term:{id: params[:ids]}}).presence
​​              ].compact,
​​          }
​​      }
​​  }
​​
​​  if search_string
​​    definition[:query][:bool][:should] = [
​​        {query_string: {query: search_string}},
​​        {multi_match:{
​​            fields: search_fields,
​​            query:  search_string,
​​            operator: :and,
​​            analyzer: :standard,
​​            tie_breaker: 0.3
​​        }}
​​    ]
​​    definition[:query][:bool][:minimum_should_match] = 1
​​    
​​  end
​​  definition[:sort] = {"#{order}.raw" => 'asc'} if order.in?(SomeModel.column_names)
​​  SomeModel.__elasticsearch__.search(definition).page(params[:page]).per(params[:per])
​​end

And that’s all. Please leave comment if you have any questions or suggestions how to make this better. Next time we’ll see how to make search on multi-models and single model at the same time.

Vladimir Krõlov

Comments are closed here.

Liked this post?

There’s more where that came from. Follow us on Facebook, Twitter or subscribe to our RSS feed to get all the latest posts immediately.