Using ElasticSearch to drive extremely fast Search applications

More and more IT applications, both transactional as well as reporting and analytics systems, are faced with the need to deal with textual data at a volume & velocity that is pushing traditional text search technologies beyond breaking point. A variety of solutions, both commercial and open-source, have emerged to address the growing need for a high-performance, scalable, yet affordable solution for Text Search.

ElasticSearch seems to be the clear front-runner among the open-source options available currently.

This article examines the architecture and features of ElasticSearch and outlines some use cases that seem to be particularly well-suited for the use of ElasticSearch.

ElasticSearch - Genesis, Architecture & Features

ElasticSearch is the result of a GitHub project that essentially re-architected Lucene, Apache’s very popular Java library for full-text search, for the Hadoop distributed, HDFS-based framework. ElasticSearch therefore inherits all of Hadoop’s Big Data Technology characteristics including its support for horizontally scalability and fault-tolerance.

As with Hadoop, a typical ElasticSearch implementation consists of a cluster of nodes running …..ES..HDFS..MapReduce jobs to convert incoming text to JSON objects for ingestion into ES storage…:

Features of ElasticSearch:

HORIZONTAL SCALING

Simply add more servers into your Hadoop farm to scale out. 

HIGH PERFORMANCE

Response times of a few milliseconds on searches across terabytes of data.

REST API

Robust, full-featured, intuitive API to support query applications built on top of your ElasticSearch appliances. JSON support allows for language-independent querying.

FLEXIBLE QUERY CAPABILITY

Support for queries involving geographic bounding, wildcards, phrases, etc.

FLEXIBLE SCHEMA

Support for flexible and heterogeneous schemas facilitates storage of content in native formats.

Salient Use Cases

Call Center Log Analysis
  • typical call center..X calls per minute per person…X megabytes of text from audio conversion. terabytes of logs per month.
  • can implement very useful search and text analytics apps on top of call center logs
  • e.g. finding products/issues generating most calls, using ES support for geographic queries to identify support call hot spots
Web Server Log Analysis
  • powerful and scalable web log analysis
  • Visualization and slice/dice of web log data in ES using http://www.elasticsearch.org/overview/kibana/
E-Commerce Site – Product Search
  • Enabling full-text search across product descriptions on tens of thousands of SKUs on a e-commerce site
  • improved customer experience and conversion rates
  • faceted search
Media Applications
  • Highly scalable search apps for media companies
  • Access to heretofore inaccessible historical content
Social Media Filtering
  • Stream social media feeds into ES and enable fast searches for mentions of specific brands and keywords across terabytes of data