Saravanan Selvamohan

Associate Software Engineer@Tekclan

Elasticsearch is an open source full text search engine. When a document is stored, it gets indexed and is available for search within a second. Elasticsearch enables to provide search results near real time. When you read the above line, this question will naturally arise in your mind as to what makes Elasticsearch the fastest? Let us understand the science behind Elasticsearch.

Elasticsearch is built on top of the Apache Lucene library which provides indexing and search features.

Let us understand with some practical examples:

Assume three simple documents get indexed in Elasticsearch.

  PUT /twitter/1
  
  {
    "tweet": "What is happening there"
  }
 
  PUT /twitter/2
  
  {
    "tweet": "What is your name"
  }
 
  PUT /twitter/3
  
  {
    "tweet": "My name is Sarvanan Selvamohan"
  }
 

When these documents get indexed inside Elasticsearch, the data will be stored in inverted index format. Let us have a look on inverted indexed structure after indexing.

Term Document Id
happening 1
is 1,2,3
my 3
name 2,3
saravanan 3
selvamohan 3
there 1
what 1,2
your 2

Well, now comes the most interesting part !

Inverted index columning will store the values of the columns after sorting it alphabetically. For each term, it will assign the respective document ids. When the user searches a term, Elasticsearch scans the term and is able to fetch the documents associated with the term with the help of inverted index structure.

For example, if a user searches the word 'name' it will directly fetch Document 2 and Document 3 as results.

Furthermore, if we do not perform an inverted index search, we need to go through each document and filter all the documents that match with the search results. This will hugely impact performance when we work with millions of documents.