Hacker Newsnew | past | comments | ask | show | jobs | submit | buryat's commentslogin

I just wrote a tool for reducing logs for LLM analysis (https://github.com/ascii766164696D/log-mcp)

Lots of logs contain non-interesting information so it easily pollutes the context. Instead, my approach has a TF-IDF classifier + a BERT model on GPU for classifying log lines further to reduce the number of logs that should be then fed to a LLM model. The total size of the models is 50MB and the classifier is written in Rust so it allows achieve >1M lines/sec for classifying. And it finds interesting cases that can be missed by simple grepping

I trained it on ~90GB of logs and provide scripts to retrain the models (https://github.com/ascii766164696D/log-mcp/tree/main/scripts)

It's meant to be used with Claude Code CLI so it could use these tools instead of trying to read the log files


Mendral co-founder here and author of the post.

This is an interesting approach. I definitely agree with the problem statement: if the LLM has to filter by error/fatal because of context window constraints, it will miss crucial information.

We took a different approach: we have a main agent (opus 4.6) dispatching "log research" jobs to sub agents (haiku 4.5 which is fast/cheap). The sub agent reads a whole bunch of logs and returns only the relevant parts to the parent agent.

This is exactly how coding agents (e.g. Claude Code) do it as well. Except instead of having sub agents use grep/read/tail, they use plain SQL.


yeah, I saw Claude Code doing lots of grepping/find and was curious if that approach might miss something in the log lines or if loading small portion of interesting log lines into the context could help. I find frequently that just looking at ERROR/WARN lines is not enough since some might not actually be errors and some other skipped log lines might have something to look into.

And I just wanted to try MCP tooling tbh hehe Took me 2 days to create this to be honest


From our experience running this, we're seeing patterns like these:

- Opus agent wakes up when we detect an incident (e.g. CI broke on main)

- It looks at the big picture (e.g. which job broke) and makes a plan to investigate

- It dispatches narrowly focused tasks to Haiku sub agents (e.g. "extract the failing log patterns from commit XXX on job YYY ...")

- Sub agents use the equivalent of "tail", "grep", etc (using SQL) on a very narrow sub-set of logs (as directed by Opus) and return only relevant data (so they can interpret INFO logs as actually being the problem)

- Parent Opus agent correlates between sub agents. Can decide to spawn more sub agents to continue the investigation

It's no different than what I would do as a human, really. If there are terabytes of logs, I'm not going to read all of them: I'll make a plan, open a bunch of tabs and surface interesting bits.


I have an agent system analyzing time series data periodically. What I've landed on is the tools themselves pre-process time series data, giving it more semantic meaning. AKA converting timestamps to human dates, additionally preprocessing it with statistical analysis, such as calculating current windows min/mean/max value for the series as well as a the same for a trailing window and surfacing those in the data. Also adding a volatility score, and doing things like collapsing runs of similar series that aren't particularly interesting from a volatility perspective and just trying to highlight anomalous series in the window in various ways.

This isn't anything new. It's not particularly technical or novel in any way, but it seems to work pretty well for identifying anomalies and comparing series over time horizons. It's even less token efficient on small windows than piping in a bunch of json, but it seems to be more effective from an analysis point of view.

The strange thing about it is that it involves fairly deterministic analysis before we even send the data to the LLM, so one might ask, what's the point if you're already doing analysis? The answer is that LLMs can actually find interesting patterns across a lot of well presented data, and they can pick up on patterns in a way that feels like they are cross-referencing many different time series and correlate signals in interesting ways. That's where the general purpose LLMs are helpful in my experience.

Breaking out analysis into sub-agents is a logical next step, we just haven't gotten there yet.

And yeah the goal is to approximate those of us engineers who are good at RCAs in the moment, who have instincts about the system and can juggle a bunch of tabs and cross reference the signals in them.


This was my approach when using agents to analyze HVAC IoT data doing anomaly detection / investigations and it similarly worked very well. Mix that with some context like install location, geographic features with some context / info on seasonality (like ASHRAE values for the regions), and some classification like (residential / commercial), the bot was quite able to deliver actual insights into problems vs creating a bunch of excess noise.

We also mixed in some GSA (https://arxiv.org/abs/2503.04104) steps during the analysis in the sub agents to further reduce hallucinations


Glad to hear this. I actually went down this path based off of guidance from multiple LLMs (Anthropic, OpenAI, etc.), so I wasn't sure if it was just some kind of weird hallucination they all had or if they were regurgitating a very small amount of knowledge on this topic, because it was kinda hard to find stories where people had success with these strategies. Thank you for the link to the paper. I will definitely be reading it.

So how can this be a company when it’s just what Claude code already does?

You may want to also have your agents write small scripts that auto flag future logs.

Have an array of scripts to run against each log (just rust code probably for speed) and have them flag for performance, errors, intrusions, etc...


did you create the subagent yourself?claude's agent never called haiku in my case

Do you think it could do anything interesting with a highly compressed representation? CLP can apparently achieve 169x compression ratio:

https://github.com/y-scope/clp

https://www.uber.com/blog/reducing-logging-cost-by-two-order...


interesting approach, thanks for directing me!

Since the classifier would need to have access to the whole log message I was looking into how search is organized for the CLP compression and see that:

> First, recall that CLP-compressed logs are searchable–a user query will first be directed to dictionary searches, and only matching log messages will be decompressed.

so then yeah it can be combined with a classifier as they get decompressed to get a filtered view at only log lines that should be interesting.

The toughest part is still figuring out what does "interesting" actually mean in this context and without domain knowledge of the logs it would be difficult to capture everything. But I think it's still better than going through all the logs post searching.


I like the idea of SQL as the "common tongue" because provided the query is reasonably terse it's easy for the human to verify and reason about, there's shitloads of it in the LLM's training set, and (usually) the database doesn't lie. So you've mitigated some major LLM drawbacks that way.

Another thing SQL has in it's favor is the ability with tools like trino or datafusion to basically turn "everything" into a table.

EDIT: thinking on it some more, though, at what point do you just know off the top of your head the small handful of SQL queries you regularly use and just skip the expensive LLM step altogether? Like... that's the thing that underwhelms me about all the "natural language query" excitement. We already have a very good, natural language for queries: SQL.


> small handful of SQL queries you regularly use

Give those queries to the LLM and enjoy your sleep while the agent works.


hell yeah, give it the ssh keys too and sleep all the time

https://github.com/dx-tooling/platform-problem-monitoring-co... could have a useful approach, too: it finds patterns in log lines and gives you a summary in the sense of „these 500 lines are all technically different, but they are all saying the same“.

the patter matcher is interesting to also collapse log lines and compare that between runs, thank you!

In my tool I was going more of a premise that it's frequently difficult to even say what you're looking for so I wanted to have some step after reading logs to say what should be actually analyzed further which naturally requires to have some model


very interesting, curious if there is any downside to running this at scale (compute?)

I'd assume it probably depends how large and varied your logs are?

But, my guess, I could see an algorithm like that being very fast. It's basically just doing a form of compression, so I'm thinking ballpark, like similar amount to just zipping the log

Can't be anything CLOSE to the compute cost of running any part of the file through an LLM haha


claude /logout -> claude /login -> claude /remote-control

If you'd read the entire issue, you'd see that not only is that solution mentioned multiple times, it's not working for some people.

it worked for me

If it worked for everybody that issue would already be closed.

I have a B&W 700 series speakers and a devialet phanthom 108db, was quite surprised at the quality that devialet delivers


How much value was created using products created by Jony Ive and his team?


Humans probably can orbit the sun 2000+ times in a not significantly distant future


it's like cancer, it keeps replicating


dug too deep and awakened balrog


mschf is pretty close to what you're describing


Wow, what a gem, thanks for recommending this! https://mschfhotels.com/products/flipped-flop


Those high heeled slides are kinda cool.


Emotions removed, how large his equity stake would be if he had stayed at OpenAI?


If you're rich and can travel faster, why not? I hope they make a coupe version


Perhaps for Jared Issacman type of billionaire .

Rich people prefer the Rolls or Bentley when being a passenger, Sport/ performance vehicles are only fun if you are driving, I would expect the G650/800 style jets would be the preferred plane even if it is slower when you can travel in style and with your entourage.

Also range would be a consideration to this type of jet for passenger travel. Travel times makes difference only for long distance over the ocean flights, these jets tend to be quite short ranged.

XB-1 is only designed for 1000nm at 2.2 Mach compared to the 7000nm of G650 with cruise speed of 0.92 Mach. Basically XB-1 can fly for 40minutes at a time at its cruise(top?) speed of 2.2Mach


On the road, you have to go the speed limit whether you're in a rolls or a lambo. You might prefer a lambo if it could get there twice as fast


Lambo (all sport cars) owners tend to take speed limits as suggestions, it might actually be going twice as fast if not legally, still people being driven around, would prefer a Rolls.

Rolls and Bentley are normally driven slower than even regular cars like high end sedans, they are big and unwieldy, and their passengers really care about ride quality, the scene from parasite about coffee cup comes to mind.

If you had the money to own either car, you are rich enough that whenever you reach the destination is on time, other people work on your schedule, then speed becomes less important.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: