Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ElasticSearch itself should be very good now since they have moved to Lucene 4.0 which brought in lot of improvements in memory usage.

I evaluated elasticsearch for RT analytics. It works wonders for point queries, where your result set is going to be small. Didn't work well for aggregate queries which need to scan lot of data. The biggest problem was field cache in Lucene. Almost all our queries needed to fo faceting which had a big impact on field cache.

Also, I don't know about Riak, but in ES the joins you can do are very limited.



I'll do extensive testing, but I need to scan a lot of data (aggregate basically). I'd be comfortable even with index size in multiples of data size if it delivered RT queries. Have you evaluated anything else?


We also checked mongodb. We dropped it mainly because index size was getting too big.

If your data is read-only then Cloudera Impala is worth a try. It's really fast.


I was looking at Impala (Cassandra) as well as keeping an eye on Drill progress. My data is write only in ETL stage so it seems it could be the right way. Lots of testing ahead! - thanks




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: