Don't try using Elasticsearch like an RDBMS

I’ve worked with Elasticsearch for several years and have seen it used in both good and bad ways. Some of the less great ways of using Elasticsearch have been repeated at multiple clients.

The biggest issue I frequently see is thinking of Elasticsearch in a similar way to how people think and reason about a row-based relational database. This can lead to insidious problems that can be very difficult and time-consuming to remedy.

The performance and effects of update operations is one key difference between Elasticsearch and an RDBMS. An update is not a single operation; it’s really a read, merge, update, and delete, meaning it’s not a very performant operation when used frequently.

Another problem is that the result of an update or index operation ins’t visible as soon as the request completes. A refresh needs to occur before the changes to the document reflect in search results. Refreshes are necessary, but quite expensive. Using Elasticsearch as a transactional data store is not a good idea - almost for this reason alone.

Modeling data is also another pain area for a lot of newcomers to Elasticsearch. An index is not similar to a database table, no matter how many tutorials tell you otherwise. An index is more similar to the result of a SQL query joining multiple tables together to build a particular projection of your data. Elasticsearch mappings should really be built around this thinking - what are your queries going to look like?

I’ve just published a video on when not to use Elasticsearch, or - at least - things to consider when designing your cluster. It feels great to get this off my chest!