# Part 5 Understanding Mapping with Elasticsearch and Kibana

## Review from Previous Workshops¶

### Indexing a Document¶

The following request will index the following document.

Syntax:

POST Enter-name-of-the-index/_doc
{
"field": "value"
}


Example:

POST temp_index/_doc
{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}


Expected response from Elasticsearch:

Elasticsearch will confirm that this document has been successfully indexed into the temp_index.

## Mapping Explained¶

Mapping determines how a document and its fields are indexed and stored by defining the type of each field.

It contains a list of the names and types of fields in an index. Depending on its type, each field is indexed and stored differently in Elasticsearch.

### Dynamic Mapping¶

When a user does not define mapping in advance, Elasticsearch creates or updates the mapping as needed by default. This is known as dynamic mapping.

With dynamic mapping, Elasticsearch looks at each field and tries to infer the data type from the field content. Then, it assigns a type to each field and creates a list of field names and types known as mapping.

Depending on the assigned field type, each field is indexed and primed for different types of requests(full text search, aggregations, sorting). This is why mapping plays an important role in how Elasticsearch stores and searches for data.

### View the Mapping¶

Syntax:

GET Enter_name_of_the_index_here/_mapping


Example:

GET temp_index/_mapping


Expected response from Elasticsearch:

Elasticsearch returns the mapping of the temp_index. It lists all the fields of the document in an alphabetical order and lists the type of each field(text, keyword, long, float, date, boolean and etc).

### Indexing Strings¶

There are two kinds of string field types: 1. Text 2. Keyword

By default, every string gets mapped twice as a text field and as a keyword multi-field. Each field type is primed for different types of requests.

Text field type is designed for full-text searches.

Keywordfield type is designed for exact searches, aggregations, and sorting.

You can customize your mapping by assigning the field type as either text or keyword or both!

#### Text Field Type¶

##### Text Analysis¶

Ever notice that when you search in Elasticsearch, it is not case sensitive or punctuation does not seem to matter? This is because text analysis occurs when your fields are indexed.

By default, strings are analyzed when it is indexed. The string is broken up into individual words also known as tokens. The analyzer further lowercases each token and removes punctuations.

Inverted Index Once the string is analyzed, the individual tokens are stored in a sorted list known as the inverted index. Each unique token is stored in the inverted index with its associated ID.

The same process occurs every time you index a new document.

#### Keyword Field Type¶

Keyword field type is used for aggregations, sorting, and exact searches. These actions look up the document ID to find the values it has in its fields.

Keyword field is suited to perform these actions because it uses a data structure called doc values to store data.

For each document, the document id along with the field value(original string) are added to the table. This data structure(doc values) is designed for actions that require looking up the document ID to find the values it has in its fields.

When Elasticsearch dynamically creates a mapping for you, it does not know what you want to use a string for so it maps all strings to both field types.

In cases where you do not need both field types, the default setting is wasteful. Since both field types require creating either an inverted index or doc values, creating both field types for unnecessary fields will slow down indexing and take up more disk space.

This is why we define our own mapping as it helps us store and search data more efficiently.

### Mapping Exercise¶

Project: Build an app for a client who manages a produce warehouse

This app must enable users to: 1. search for produce name, country of origin and description

1. identify top countries of origin with the most frequent purchase history

2. sort produce by produce type(Fruit or Vegetable)

3. get the summary of monthly expense

Sample data

{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}


Plan of Action

Rules 1. If you do not define a mapping ahead of time, Elasticsearch dynamically creates the mapping for you. 2. If you do decide to define your own mapping, you can do so at index creation. 3. ONE mapping is defined per index. Once the index has been created, we can only add new fields to a mapping. We CANNOT change the mapping of an existing field. 4. If you must change the type of an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.

Step 1: Index a sample document into a test index.

The sample document must contain the fields that you want to define. These fields must also contain values that map closely to the field types you want.

Syntax:

POST Name-of-test-index/_doc
{
"field": "value"
}


Example:

POST test_index/_doc
{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}


Expected response from Elasticsearch:

The test_index is successfully created.

Step 2: View the dynamic mapping

Syntax:

GET Name-the-index-whose-mapping-you-want-to-view/_mapping


Example:

GET test_index/_mapping


Expected response from Elasticsearch:

Elasticsearch will display the mapping it has created. It lists the fields in an alphabetical order. This document is identical to the one we indexed into the temp_index. To save space, the screenshots of the mapping has not been included here.

Step 3: Edit the mapping

Copy and paste the mapping from step 2 into the Kibana console. From the pasted results, remove the "test_index" along with its opening and closing brackets. Then, edit the mapping to satisfy the requirements outlined in the figure below.

The optimized mapping should look like the following:

{
"mappings": {
"properties": {
"botanical_name": {
"enabled": false
},
"country_of_origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"date_purchased": {
"type": "date"
},
"description": {
"type": "text"
},
"name": {
"type": "text"
},
"produce_type": {
"type": "keyword"
},
"quantity": {
"type": "long"
},
"unit_price": {
"type": "float"
},
"vendor_details": {
"enabled": false
}
}
}
}


Step 4: Create a new index with the optimized mapping from step 3.

Syntax:

PUT Name-of-your-final-index
{
copy and paste your edited mapping here
}


Example:

PUT produce_index
{
"mappings": {
"properties": {
"botanical_name": {
"enabled": false
},
"country_of_origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"date_purchased": {
"type": "date"
},
"description": {
"type": "text"
},
"name": {
"type": "text"
},
"produce_type": {
"type": "keyword"
},
"quantity": {
"type": "long"
},
"unit_price": {
"type": "float"
},
"vendor_details": {
"enabled": false
}
}
}
}


Expected response from Elasticsearch:

Elasticsearch creates a produce_index with the customized mapping we defined above!

Step 5: Check the mapping of the new index to make sure the all the fields have been mapped correctly

Syntax:

GET Name-of-test-index/_mapping


Example:

GET produce_index/_mapping


Expected response from Elasticsearch:

Compared to the dynamic mapping, our optimized mapping looks more simple and concise! The current mapping satisfies the requirements that are marked with green check marks.

Step 6: Index your dataset into the new index

For simplicity's sake, we will index two documents.

Index the first document

POST produce_index/_doc
{
"name": "Pineapple",
"botanical_name": "Ananas comosus",
"produce_type": "Fruit",
"country_of_origin": "New Zealand",
"date_purchased": "2020-06-02T12:15:35",
"quantity": 200,
"unit_price": 3.11,
"description": "a large juicy tropical fruit consisting of aromatic edible yellow flesh surrounded by a tough segmented skin and topped with a tuft of stiff leaves.These pineapples are sourced from New Zealand.",
"vendor_details": {
"vendor": "Tropical Fruit Growers of New Zealand",
"main_contact": "Hugh Rose",
"vendor_location": "Whangarei, New Zealand",
"preferred_vendor": true
}
}


Expected response from Elasticsearch:

Elasticsearch successfully indexes the first document.

Index the second document

The second document has almost identical fields as the first document except that it has an extra field called "organic" set to true!

POST produce_index/_doc
{
"name": "Mango",
"botanical_name": "Harum Manis",
"produce_type": "Fruit",
"country_of_origin": "Indonesia",
"organic": true,
"date_purchased": "2020-05-02T07:15:35",
"quantity": 500,
"unit_price": 1.5,
"description": "Mango Arumanis or Harum Manis is originated from East Java. Arumanis means harum dan manis or fragrant and sweet just like its taste. The ripe Mango Arumanis has dark green skin coated with thin grayish natural wax. The flesh is deep yellow, thick, and soft with little to no fiber. Mango Arumanis is best eaten when ripe.",
"vendor_details": {
"main_contact": "Suharto",
"vendor_location": "Binjai, Indonesia",
"preferred_vendor": true
}
}


Expected response from Elasticsearch:

Elasticsearch successfully indexes the second document.

Let's see what happens to the mapping by sending this request below:

GET produce_index/_mapping


Expected response from Elasticsearch:

The new field("organic") and its field type(boolean) have been added to the mapping. This is in line with the rules of mapping we discussed earlier since you can add new fields to the mapping. We just cannot change the mapping of an existing field!

#### What if you do need to make changes to the mapping of an existing field?¶

Let's say your client changed his mind. He wants to run only full text search on the field "botanical_name" we disabled earlier.

Remember, you CANNOT change the mapping of an existing field. If you do need to make changes to an existing field, you must create a new index with the desired mapping, then reindex all documents into the new index.

STEP 1: Create a new index(produce_v2) with the latest mapping.

We removed the "enabled" parameter from the field "botanical_name" and changed its type to "text".

Example:

PUT produce_v2
{
"mappings": {
"properties": {
"botanical_name": {
"type": "text"
},
"country_of_origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"date_purchased": {
"type": "date"
},
"description": {
"type": "text"
},
"name": {
"type": "text"
},
"organic": {
"type": "boolean"
},
"produce_type": {
"type": "keyword"
},
"quantity": {
"type": "long"
},
"unit_price": {
"type": "float"
},
"vendor_details": {
"type": "object",
"enabled": false
}
}
}
}


Expected response from Elasticsearch:

Elasticsearch creates a new index(produce_v2) with the latest mapping.

If you check the mapping, you will see that the filed "botanical_name" has been typed as text.

View the mapping of produce_v2:

GET produce_v2/_mapping


Expected response from Elasticsearch:

STEP 2: Reindex the data from the original index(produce_index) to the one you just created(produce_v2).

POST _reindex
{
"source": {
"index": "produce_index"
},
"dest": {
"index": "produce_v2"
}
}


Expected response from Elasticsearch:

This request reindexes data from the produce_index to the produce_v2 index. The produce_v2 index can now be used to run the requests that the client has specified.

#### Runtime Field¶

Step 1: Create a runtime field and add it to the mapping of the existing index.

Syntax:

PUT Enter-name-of-index/_mapping
{
"runtime": {
"Name-your-runtime-field-here": {
"type": "Specify-field-type-here",
"script": {
"source": "Specify the formula you want executed"
}
}
}
}


Example:

PUT produce_v2/_mapping
{
"runtime": {
"total": {
"type": "double",
"script": {
"source": "emit(doc['unit_price'].value* doc['quantity'].value)"
}
}
}
}


Expected response from Elasticsearch:

Elasticsearch successfully adds the runtime field to the mapping.

Step 2: Check the mapping:

GET produce_v2/_mapping


Expected response from Elasticsearch:

Elasticsearch adds a runtime field to the mapping(red box).

Note that the runtime field is not listed under "properties" object which includes the fields in our documents. This is because the runtime field "total" is not indexed!

Step 3: Run a request on the runtime field to see it perform its magic!

Please note that the following request does not aggregate the monthly expense here. We are running a simple aggregation request to demonstrate how runtime field works!

The following request runs a sum aggregation against the runtime field total of all documents in our index.

Syntax:

GET Enter_name_of_the_index_here/_search
{
"size": 0,
"aggs": {
"Specify the aggregation type here": {
"field": "Name the field you want to aggregate on here"
}
}
}
}


Example:

GET produce_v2/_search
{
"size": 0,
"aggs": {
"total_expense": {
"sum": {
"field": "total"
}
}
}
}


Expected response from Elasticsearch:

When this request is sent, a runtime field called "total" is created and calculated for documents within the scope of our request(entire index). Then, the sum aggregation is ran on the field "total" over all documents in our index.

The runtime field is only created and calculated when a request made on the runtime field is being executed. Runtime fields are not indexed so these do not take up disk space.

We also did not have to reindex in order to add a new field to existing documents. For more information on runtime fields, check out this blog!