by Bozho | Dec 2, 2019 | Aggregated, Developer tips, elasticsearch, indexing
Choosing your indexing strategy is hard. The Elasticsearch documentation does have some general recommendations, and there are some tips from other companies, but it also depends on the particular usecase. In the typical scenario you have a database as the source of truth, and you have an index that makes things searchable. And you can have the following strategies: Index as data comes – you insert in the database and index at the same time. It makes sense if there isn’t too much data; otherwise indexing becomes very inefficient. Store in database, index with scheduled job – this is probably the most common approach and is also easy to implement. However, it can have issues if there’s a lot of data to index, as it has to be precisely fetched with (from, to) criteria from the database, and your index lags behind the actual data with the number of seconds (or minutes) between scheduled job runs Push to a message queue and write an indexing consumer – you can run something like RabbitMQ and have multiple consumers that poll data and index it. This is not straightforward to implement because you have to poll multiple items in order to leverage batch indexing, and then only mark them as consumed upon successful batch execution – somewhat transactional behaviour. Queue items in memory and flush them regularly – this may be good and efficient, but you may lose data if a node dies, so you have to have some sort of healthcheck based on the data in the database Hybrid – do a combination of the above; for example if you...
Recent Comments