Exercises 01 - 12

Introduction

This is the first round of Elasticsearch exercises. In this set, we will load in the data and get the index ready to start cleaning up the documents.

Please get in touch if you have any questions or feedback.

Topics covered

  • Creating indices
  • Defining mappings
  • Reindexing
  • Ingest pipelines
  • Delete by query
  • Data Visualizer

Exercises

Exercise 01

Configure Elasticsearch with the following criteria and start Elasticsearch:

Property Value
Cluster name lab-cluster
Node name node01
Heap size 2g

Exercise 02

Configure Kibana to point to your Elasticsearch node and start Kibana.

Exercise 03

Download the dataset from here and use Kibana’s Data Visualizer to upload the file into a new index called olympic-events.

Exercise 04

Validate that the data was imported correctly by using a single API call to show the index name, index health, number of documents, and the size of the primary store. The details in the response must be in that order, with headers, and for the new index only.

Exercise 05

The cluster health is yellow. Use a cluster API that can explain the problem.

Exercise 06

Change the cluster or index settings as required to get the cluster to a green status.

Exercise 07

Look at how Elasticsearch has applied very general-purpose mappings to the data. Why has it chosen to use a keyword type for the Age field? Find all unique values for the Age field; there are less than 100 unique values for the Age field. Look for any suspicious values.

Exercise 08

We will be deleting data in the next exercise; making a backup is always prudent. Without making any changes to the data, reindex the olympic-events index into a new index called olympic-events-backup.

Exercise 09

The Height and Weight fields suffer from the same problem as the Age field. Later exercises will require numeric-type queries for these fields so we want to exclude any document we can’t use in our analyses. In a single request, delete all documents from the olympic-events index that have a value of NA for either the Age, Height or Weight field.

Exercise 10

Notice how the Games field contains both the Olympic year and season. Create an ingest pipeline called split_games that will split this field into two new fields - year and season - and remove the original Games field.

Exercise 11

Ensure your new pipeline is working correctly by simulating it with these values:

  • 1998 Summer
  • 2014 Winter

Exercise 12

We’ll now start to clean up the mappings. Create a new index called olympic-events-fixed with 1 shard, 0 replicas, and the following mapping:

Field Type
athleteId integer
age short
height short
weight short
athleteName text + keyword
gender keyword
team keyword
noc keyword
year short
season keyword
city text + keyword
sport keyword
event text + keyword
medal keyword
Built with Hugo
Theme based on Stack originally designed by Jimmy, forked by George Bridgeman