Elastic Search Series : Text Analysis and Mappings

Introduction

In this article we will see 2 important concepts related to Elasticsearch : Mappings and Text Analysis.

Let’s start with mappings !

Mappings

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For instance, use mappings to define:

which string fields should be treated as full text fields.
which fields contain numbers, dates, or geolocations.
the format of date values.
custom rules to control the mapping for dynamically added fields.

A mapping contains :

name of fields
data types of fields
how the field should be indexed and stored by Lucerne

In many uses cases, you will need to define your own mappings. It can be during creation or by adding to a mapping of an existing index.

If no mapping is specified during index creation, a new mapping will be defined by default.

Example :

PUT my_logs
{
   "mappings": {
      "properties": {
         "status_code": {
             "type": "short"
         }
      }
   }
}

Here are some types you can use : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

Notes :

You can’t change a mapping without reindexing your documents.
You should optimize your mappings so Elasticsearch can index and query your data most effictively.

Custom Mappings

Let’s see together how we can custom our mappings with different parameters. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html

Date

You can change the date format with a custom mapping :

PUT my_index
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date",
        "format": "yyyy-MM-dd"
       }
    }
  }
}

Copy_to

The copy_to parameter allows you to copy the values of multiple fields into a group field, which can then be queried as a single field. For instance, the first_name and last_name fields can be copied to the full_name field as follows:

PUT my_index
{
  "mappings": {
    "properties": {
      "first_name": {
        "type": "text",
        "copy_to": "full_name" 
      },
      "last_name": {
        "type": "text",
        "copy_to": "full_name" 
      },
      "full_name": {
        "type": "text"
      }
    }
  }
}

PUT my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }

Text Analysis

Elasticsearch is definitively a great tool to perform text analysis. Specially when your fields are indexed. The analysis is done by a standard analyzer which can be changed by others (language-specific, pattern, stop etc.)

Example :

GET logs_server*/_search
{
   "query": {
      "match": {
         "geoip.country_name": "united states"
      }
   }
}

In this case, we are searching through nodes, all documents which contain “united” or “states.”

Types of string datatypes :

text fields : for full text search. They are analyzed.
keyword fields : for aggregation, sorting and exact searches. Keyword fields are not analyzed.

By default, every string gets dynamically mapped twice, as a text field and as a keyword multi-field. Indexing string twice slows down indexing and takes up more disk space.

Sometimes it may be useless so Elasticsearch advices you to optimize the mapping to support your use case.

Sources :

https://www.elastic.co

Introduction

Mappings

Example :

Custom Mappings

Date

Copy_to

Text Analysis

Example :

Types of string datatypes :

Related Posts

Leave a Comment Cancel Reply