## The problem

A question on the Elasticsearch Slack today received an interesting answer that used a trick I haven’t noticed before. The trick is noteworthy as it has some handy use-cases.

The question boiled down to finding the number of documents where a field value is `true`

, then using that count in a `bucket_script`

pipeline aggregation.

Ben Trent - an ML Engineer at Elastic - replied with a trick involving the `sum`

aggregation type. When applied to a `boolean`

field, `sum`

will treat values of `false`

as `0`

and `true`

as `1`

. This results in it returning the number of documents with a value of `true`

.

The Gist Ben provided gave a great example of how this can be used to address the OP’s problem. I’ll go over how it works but change the mapping to be a bit more specific.

## The solution

### Setup

I’ll create the mapping and index some documents first:

```
PUT products/
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"productCategory": {
"type": "keyword"
},
"inStock": {
"type": "boolean"
}
}
}
}
PUT products/_doc/1
{
"productCategory": "book",
"inStock": true
}
PUT products/_doc/2
{
"productCategory": "book",
"inStock": false
}
PUT products/_doc/3
{
"productCategory": "book",
"inStock": true
}
PUT products/_doc/4
{
"productCategory": "toy",
"inStock": true
}
PUT products/_doc/5
{
"productCategory": "toy",
"inStock": true
}
PUT products/_doc/6
{
"productCategory": "toy",
"inStock": false
}
PUT products/_doc/7
{
"productCategory": "toy",
"inStock": false
}
```

### Start at the top

I’ll start by analysing the products by category.

```
GET products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "productCategory"
}
}
}
}
```

This creates a bucket per category - I’m removing the metadata and `hits`

blocks from all results for brevity:

```
{
"aggregations" : {
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "toy",
"doc_count" : 4
},
{
"key" : "book",
"doc_count" : 3
}
]
}
}
}
```

We need to know the following in order to work out the percentage of products that are in stock:

- Number of products in the category with a value of
`true`

for the`inStock`

field - Total number of products in the category

### Counting in stock products

Finding the number of in stock products is the more tricky of the two but this is where the `sum`

aggregation will make life much easier. To see how it works, use a `terms`

aggregation on the `inStock`

field:

```
GET products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "productCategory"
},
"aggs": {
"inStock": {
"terms": {
"field": "inStock"
}
}
}
}
}
}
```

The bucket keys in the results show how Elasticsearch is treating the values:

```
{
"aggregations" : {
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "toy",
"doc_count" : 4,
"inStock" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 0,
"key_as_string" : "false",
"doc_count" : 2
},
{
"key" : 1,
"key_as_string" : "true",
"doc_count" : 2
}
]
}
},
{
"key" : "book",
"doc_count" : 3,
"inStock" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"key_as_string" : "true",
"doc_count" : 2
},
{
"key" : 0,
"key_as_string" : "false",
"doc_count" : 1
}
]
}
}
]
}
}
}
```

Switching the `terms`

to a `sum`

will give us the number we need straight away, as Elasticsearch adds up all the ones (in stock) and zeroes (not in stock):

```
GET products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "productCategory"
},
"aggs": {
"numInStock": {
"sum": {
"field": "inStock"
}
}
}
}
}
}
```

```
{
"aggregations" : {
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "toy",
"doc_count" : 4,
"numInStock" : {
"value" : 2.0,
"value_as_string" : "true"
}
},
{
"key" : "book",
"doc_count" : 3,
"numInStock" : {
"value" : 2.0,
"value_as_string" : "true"
}
}
]
}
}
}
```

### Counting all products

Ben used a `value_count`

aggregation to count the number of documents in the `categories`

buckets. This is an unnecessary step; the document count is already available for us to use in a pipeline aggregation.

### Calculating the percentage

Here’s the pipeline aggregation to calculate the percentage of in stock products:

```
GET products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "productCategory"
},
"aggs": {
"numInStock": {
"sum": {
"field": "inStock"
}
},
"percInStock": {
"bucket_script": {
"buckets_path": {
"inStock": "numInStock",
"total": "_count"
},
"script": "(params.inStock / params.total) * 100"
}
}
}
}
}
}
```

The `buckets_path`

section may look a bit strange.

A `bucket_script`

pipeline aggregation is a **parent** pipeline aggregation, meaning it can compute new values from the output of its parent aggregation. In this case, the parent is the `categories`

aggregation. `_count`

is a special bucket path that will use the document count as its input. This is the number of products in the category bucket.

`numInStock`

is a metric aggregation, so using it in a `buckets_path`

will use the metric aggregation value.

The script simply does the calculation to find the percentage and the output is just what we need:

```
{
"aggregations" : {
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "toy",
"doc_count" : 4,
"numInStock" : {
"value" : 2.0,
"value_as_string" : "true"
},
"percInStock" : {
"value" : 50.0
}
},
{
"key" : "book",
"doc_count" : 3,
"numInStock" : {
"value" : 2.0,
"value_as_string" : "true"
},
"percInStock" : {
"value" : 66.66666666666666
}
}
]
}
}
}
```

## Summary

Using `sum`

aggregations on `boolean`

fields is a really useful trick for finding the number of documents where that field value is `true`

.

Despite pipeline aggregations being removed from the Elastic Certified Engineer exam, they’re an incredibly useful feature and I will be writing more about them in the future.