Not to mention that Elasticsearch is excellent for non-text search.
One application I worked on indexes a Postgres database into Elasticsearch for live front-end queries. We index every single field, sometimes hundreds of fields in a single index. ES does this easily. Thanks to Lucene's quasi-columnar/quasi-LSM tree storage, new indexed fields aren't very expensive, and searches -- even fairly complicated ones -- are very fast.
ES is also extremely fast at aggregations. Even complex multi-level aggregations (e.g. group by date, then multiple nested buckets by different fields with "top k" results for each) take just a few hundred milliseconds for latge million-document datasets.
Where ES has problems are areas like replication, consistency and memory usage. It's very hard to tune ES; due to JVM GC and caches, it's basically impossible to predict how much RAM ES will need, and OOMs are common. There's also still no way to ask for a consistent index on query; the best you can do is use "waitfor=refresh" on indexing, which is the wrong time for it. I'd love a consistent Raft-based ES.
Could you talk about the usecase here ? This is very interesting from a db query tuning perspective. What kind of queries work well in scenarios like this ?
I thought search engines are only useful in ranking based searches ...so you accept a degree of error margin wrt databases.
Any non-joining OLTP query will perform very well with ES. It is particularly effective with low-cardinality fields where in a traditional relational database you would not benefit from a B-tree index and a database like Postgres typically would revert to a sequential scan over the entire table. Column intersections in Lucene are extremely efficient, basically streaming sorted vectors of document IDs from RAM.
Where ES is not optimal is when you need joins. That said, doing left outer joins -- which is typical in web workloads where you may have something like an "articles" table that you want to query with filters and then join against "authors" and "categories" without filters to fetch connected data -- on the client side with some basic parallelization is surprisingly effective. Currently doing that in some apps where we get <100 millisecond performance even when fetching maybe 5-6 related objects per result.
Do you do left outer join on elasticsearch...or do you do it in the client code ? I'm trying to figure out if elasticsearch supports these query types. It's something I never thought about.
One application I worked on indexes a Postgres database into Elasticsearch for live front-end queries. We index every single field, sometimes hundreds of fields in a single index. ES does this easily. Thanks to Lucene's quasi-columnar/quasi-LSM tree storage, new indexed fields aren't very expensive, and searches -- even fairly complicated ones -- are very fast.
ES is also extremely fast at aggregations. Even complex multi-level aggregations (e.g. group by date, then multiple nested buckets by different fields with "top k" results for each) take just a few hundred milliseconds for latge million-document datasets.
Where ES has problems are areas like replication, consistency and memory usage. It's very hard to tune ES; due to JVM GC and caches, it's basically impossible to predict how much RAM ES will need, and OOMs are common. There's also still no way to ask for a consistent index on query; the best you can do is use "waitfor=refresh" on indexing, which is the wrong time for it. I'd love a consistent Raft-based ES.