Introduction
In these exercises, we’ll finish getting the data into a clean state in a new index.
For exercises 01 - 12, which include a link to the data set we’re using, see the previous post.
A video version of this round is also available on YouTube.
Exercises
Exercise 13
Reindex the data in the olympic-events
index into the new olympic-events-fixed
index created in exercise 12 using the split_games
pipeline created in exercise 10.
Exercise 14
Look at the mapping for the olympic-events-fixed
index. Notice how Elasticsearch has created new fields. We created the mapping for this index with the same field names as before but we put all the field names in lowercase. Field names are case sensitive, so Age
and age
are different, distinct fields to Elasticsearch.
Also notice that the new mapping uses athleteId
instead of ID
, athleteName
instead of Name
and gender
instead of Sex
.
We’ll need to correct this by tearing down the new index and reindexing with an additional pipeline to use the correct field names. To save us constantly having to recreate the index with the right mappings, we can leverage index templates.
Create an index template called olympic-events
for new indices with a name beginning with olympic-events-
. Use the mapping and settings we defined in exercise 12 and configure the mapping so Elasticsearch will throw an exception if a document contains a field not defined in the mapping.
Exercise 15
Create a new ingest pipeline called reconcile_fields
to replace all fields with their correct field names (except for the Games
field), then also execute the split_games
pipeline.
Exercise 16
Test your new pipeline with the following document:
{
"NOC": "ARG",
"Sex": "M",
"City": "Los Angeles",
"Weight": "98",
"Name": "Ernesto Arturo Alas",
"Sport": "Shooting",
"Games": "1984 Summer",
"Event": "Shooting Men's Free Pistol, 50 metres",
"Height": "186",
"Team": "Argentina",
"ID": 2224,
"Medal": "NA",
"Age": "54"
}
Exercise 17
Delete the olympic-events-fixed
index.
Exercise 18
Reindex the data in the olympic-events
index into a new olympic-events-fixed
index using the reconcile_fields
pipeline. If Elasticsearch throws any exceptions, you may have missed a field in your pipeline.
Next steps
Part three of the exercises can be found here.