I use ELK for Kubernetes and network device logs, and I'm very much with you -- full text search is great, but it sure can be slow, even when running on $1000/month of AWS hardware.
The conclusion that I've reached is that the whole lucene model for logs is kind of outdated. Why am I tuning Java GC params to run "grep foo /logs". I think computers today can do fine with sharded flat files, a minimal index ("which node contains logs from pod foo-2387438-2384738 at 12:34AM"), and then just scale horizontally over (log messages, searches).
I hope my friends over at Tailscale are doing that and I can just move off ES entirely ;)
I believe Loki [1] is intended to basically run "grep foo" at scale (plus some extra niceties like labels). I haven't used it, but it seems interesting.
Same here, fluentd is much better, performance wise.
But then I had to give ES more RAM because it couldn't take the hammering.
In fact, increasing the throughput to ES was causing some pretty spectacular crashes, with the /var/log partition at 100% because of the verbosity of the dumps.
Logstash sucks from both operational and developing perspective. I replaced it everywhere I could by sending
structured logs directly from the app or by using newer integrated beats features.
The conclusion that I've reached is that the whole lucene model for logs is kind of outdated. Why am I tuning Java GC params to run "grep foo /logs". I think computers today can do fine with sharded flat files, a minimal index ("which node contains logs from pod foo-2387438-2384738 at 12:34AM"), and then just scale horizontally over (log messages, searches).
I hope my friends over at Tailscale are doing that and I can just move off ES entirely ;)