Exercises 13 - 18

Introduction

In these exercises, we’ll finish getting the data into a clean state in a new index.

For exercises 01 - 12, which include a link to the data set we’re using, see the previous post.

A video version of this round is also available on YouTube.

Exercises

Exercise 13

Reindex the data in the olympic-events index into the new olympic-events-fixed index created in exercise 12 using the split_games pipeline created in exercise 10.

Exercise 14

Look at the mapping for the olympic-events-fixed index. Notice how Elasticsearch has created new fields. We created the mapping for this index with the same field names as before but we put all the field names in lowercase. Field names are case sensitive, so Age and age are different, distinct fields to Elasticsearch.

Also notice that the new mapping uses athleteId instead of ID, athleteName instead of Name and gender instead of Sex.

We’ll need to correct this by tearing down the new index and reindexing with an additional pipeline to use the correct field names. To save us constantly having to recreate the index with the right mappings, we can leverage index templates.

Create an index template called olympic-events for new indices with a name beginning with olympic-events-. Use the mapping and settings we defined in exercise 12 and configure the mapping so Elasticsearch will throw an exception if a document contains a field not defined in the mapping.

Exercise 15

Create a new ingest pipeline called reconcile_fields to replace all fields with their correct field names (except for the Games field), then also execute the split_games pipeline.

Exercise 16

Test your new pipeline with the following document:

{
  "NOC": "ARG",
  "Sex": "M",
  "City": "Los Angeles",
  "Weight": "98",
  "Name": "Ernesto Arturo Alas",
  "Sport": "Shooting",
  "Games": "1984 Summer",
  "Event": "Shooting Men's Free Pistol, 50 metres",
  "Height": "186",
  "Team": "Argentina",
  "ID": 2224,
  "Medal": "NA",
  "Age": "54"
}

Exercise 17

Delete the olympic-events-fixed index.

Exercise 18

Reindex the data in the olympic-events index into a new olympic-events-fixed index using the reconcile_fields pipeline. If Elasticsearch throws any exceptions, you may have missed a field in your pipeline.

Next steps

Part three of the exercises can be found here.

All content on this site is my own and does not necessarily reflect the views of any of my employers or clients, past or present.
Built with Hugo
Theme based on Stack originally designed by Jimmy, forked by George Bridgeman