Partial search in Elasticsearch
A common problem on the Internet, with very few answers, is how to implement partial word search (-gram search in search engine vocabulary) based on Elasticsearch 5.x. A lot of the articles on the Internet are out of date, so here is a short example for the latest Elasticsearch using Ruby, my preferred programming language.
Step 1: Add Elasticsearch Gem
gem 'elasticsearch-rails'
gem 'elasticsearch-model'
Step 2: Add an async job (Sidekiq)
class ElasticsearchIndexWorker
include ::Sidekiq::Worker
sidekiq_options :queue => :indexer
# Need to check for uniqueness
def perform(klass, id)
obj = klass.constantize.find_by(id: id)
action = "update"
unless obj
obj = klass.constantize.new(id: id)
action = "remove"
end
update_index(obj, action)
end
def update_index(obj, action)
if action.to_s == "remove"
obj.__elasticsearch__.delete_document rescue false
else
res = obj.__elasticsearch__.index_document
raise Exception.new(res["error"]) if res["error"]
end
end
end
Step 3: Create a concern for the async job
module Elasticsearch
module Model
module AsyncCallbacks
extend ActiveSupport::Concern
included do
after_commit :async_elastic_save_index
after_touch :async_elastic_save_index
before_destroy :async_elastic_delete_index
def async_elastic_save_index
async_elastic_callback
end
def async_elastic_delete_index
async_elastic_callback
end
def async_elastic_callback
ElasticsearchIndexWorker.perform_async(self.class.to_s, self.id)
end
end
end
end
end
Step 4: Integration with Rails models
include Elasticsearch::Model
include Elasticsearch::Model::AsyncCallbacks
# here goes elasticsearch setting and mapping
settings analysis: {
tokenizer: {
shingles_tokenizer: { # if you have full matching then we'll need higher priority
type: 'whitespace'
},
edge_ngram_tokenizer: {
type: "edgeNGram", # we needed beginnings of the words
min_gram: "2",
max_gram: "20",
token_chars: ["letter","digit"],
filter: ["lowercase"]
}
},
analyzer: {
shingle_analyzer: {
type: 'custom',
tokenizer: 'shingles_tokenizer',
filter: ['shingle', 'lowercase', 'asciifolding']
},
edge_ngram_analyzer: {
tokenizer: "edge_ngram_tokenizer",
filter: ["lowercase"]
}
}
} do
mapping do
indexes :published, type: "boolean", index: :not_analyzed
indexes :name, type: 'text', analyzer: "edge_ngram_analyzer", search_analyzer: 'standard', boost: 120, fields: {
'shingle' => { # shingle will have higher priority
type: 'text',
analyzer: 'shingle_analyzer',
search_analyzer: 'standard',
boost: 240
},
'raw' => { # use it if you need original value for any reason
type: 'keyword',
index: :not_analyzed
}
}
indexes :descr, type: 'text', analyzer: "edge_ngram_analyzer", search_analyzer: 'standard', boost: 60, fields: {
'shingle' => {
type: 'text',
analyzer: 'shingle_analyzer',
search_analyzer: 'standard',
boost: 120
}
....
end
end
end
<strong> Step 5: Let’s use console to create an index for some model</strong>
[cc lang="ruby"]
# console
SomeModel.__elasticsearch__.create_index!(force: true)
SomeModel.find_each{|o| o.__elasticsearch__.index_document}
Step 6: Let’s implement search for a previously indexed model
def self.search params, is_admin
order = params[:order].presence
search_string = params[:query]
search_string = nil if search_string.eql?("*")
mapping = SomeModel.__elasticsearch__.mapping.to_hash[:some_model][:properties]
search_fields = mapping.each_with_object([]){|(k,v),a| a.push(k.to_s, "#{k}.shingle") if v[:fields] }
definition = {
# getting just an id.
# Remove line if want to get all model from Elasstic
_source: false,
query: {
bool:{
filter: [
(!is_admin && {term:{published: true}}).presence,
(params[:ids] && {term:{id: params[:ids]}}).presence
].compact,
}
}
}
if search_string
definition[:query][:bool][:should] = [
{query_string: {query: search_string}},
{multi_match:{
fields: search_fields,
query: search_string,
operator: :and,
analyzer: :standard,
tie_breaker: 0.3
}}
]
definition[:query][:bool][:minimum_should_match] = 1
end
definition[:sort] = {"#{order}.raw" => 'asc'} if order.in?(SomeModel.column_names)
SomeModel.__elasticsearch__.search(definition).page(params[:page]).per(params[:per])
end
And that’s all. Please leave comment if you have any questions or suggestions how to make this better. Next time we’ll see how to make search on multi-models and single model at the same time.
Comments are closed here.