Hacker Newsnew | past | comments | ask | show | jobs | submit | fine_tune's commentslogin

LLMs are only 3-5 years old (NLP is much older OFC), for all we know they'll be a dead end in research like LSTM are today - LLM/Multi Modal just look super hot. "Attention is all you need" was released in 2017, it took 5 years to prove it was useful, for all we know the new hot thing has already been published and LLMs are obsolete - Google might have been right to wait.

Besides I dont think the top people at Google's DeepMind - and I can only "infer" this from watching them speek online - actually think LLM's are "the one".


Is the goal GAI or to add as much as possible to the market cap? I was specifically talking about the latter and why they got leapfrogged by OpenAI which has been valued at a substantial fraction of Google’s overall value despite having a fraction of the revenue. If Google had managed to generate this much value for themselves they’d be respected differently but for now it seems like they missed the AI tech stack today and are playing catchup for like the next 5 years regardless of where AI evolves later.


Larry's original goal for Google was always to be a revenue vehicle to reach AGI although I don't think Sundar is interested in anything except revenue/profit.

Note also that many of Google's previous attempts with LLM generated significant press controversy, and it was in Google's interest to let other groups take the heat for a while while the overton window shifted.


>1Tn+ USD created

>a dead end

Sure, pal.


Its cloudflare trying to enshittify the internet with micro transactions[0] and take their N% cut (of course it will start at like 2% but ask any uber driver how thats going).

The problem is the arguments they make for why this should happen are quite compelling, especially to those running sites (you'll see plenty of complaints on this forum about it), but theres also a large group of people who think information / code / data should be "free" (see open source code/maps/anything you can think off). So really its just a moral debate that will be lost in the interest of profit (which is ya know good n bad, if AI companies did more caching we probably wouldn't need this, but here we are).

[0] https://blog.cloudflare.com/introducing-ai-crawl-control/


I got rage baited by this so hard, cant comprehend thinking this way.

Hung out with PhD's, economists, bankers, trust find kids, scientists, and artists - who maybe weren't top tier enough, but none thought this way.

Literally the weirdest take on a forum filled with dreamers, but every take is valid.


It's not comfortable, but this seems to be what the priors point to. I suspect that pure mathematics is one of the most intelligence-dependent fields; one where hard work, practical solving problems and a large knowledge base is less of a substitute.


> one where hard work, practical solving problems and a large knowledge base is less of a substitute

Collaboration remains an important skill – I had an REU mentor who said that, given the explosion of mathematics that one had to learn to do cutting-edge work in a field, she had to end up "pooling experience..."


> one where hard work, practical solving problems and a large knowledge base is less of a substitute.

I have seen this first hand. I remember when I was in university doing my math major. This one older adult lady (she seemed 40yrs old, and very attractive too), she had decided for some reason or other she wanted to do a major in mathematics. Not for a job or anything but just to do it.

Whereas the rest of us, let’s face it, we just wanted a good job in STEM.

Bless this lady, she was so determined and hard working. She would show up to every lecture, first in, last out, and she would show up to every study session and give it her all.

But unfortunately, she was not good at grasping the concepts nor solving the problems. It was shocking how little she grokked the introductory concepts for the amount of effort she put in. She worked harder than anyone in our group.

I don’t think any of us had the heart to tell her that maybe a math major was not in the cards.

I never saw her on campus in my 3rd year and on so imagine she dropped off.

But I was rooting for her.


I think what happens is, IQ is a sort of speed. They teach the class assuming they can get most people through it at whatever pace they are used to teaching at. If you're quick, you can keep up with a below average effort, and vice versa.

As you get higher and higher up in the stratosphere, the balance between "we need enough students" and "we need to go faster" ends up favouring the few super intelligent people, along with the people who can arrange their lives to put in the hours.

That's not to say you can't learn something if you are slow. You just can't learn it at the pace they are teaching, and you might not have the wherewithal to learn it at your own pace.

So to you it looks like this lady would never learn it, but I would guess if she had a personal tutor they would be able to pace it.


Same lol. By the OP's logic, every student pursuing this field in a university as an undergrad/graduate student should be taking an IQ test before proceeding to the upper level math courses covering these topics. Anything less than the threshold will mean they have to focus on something different.


I was going to argue "LLM's need code samples to-do well on languages and if we are honest C# is a language mostly held in private repo's" but Github's 2024 report[0] says its the 5th most used language (I'm to lazy to check if this report includes private repo's but I'll assume it doesn't).

So kinda neat to see this paper!

[0]https://github.blog/news-insights/octoverse/octoverse-2024/#...


The big labs are almost certainly using compiler/repl output for generated code as an oracle for RL. I doubt they have C# in the mix.


Why do you doubt that? It's a widely used language. And there is even an open source C# REPL.


Because RL time is expensive and I don't think the languages which are more popular than C# have such high performance that it's worth bumping their batches for C#.


But C# is a typical enterprise language which has people who are willing to pay a lot of money for AI.

We’re just guessing and the fact of the matter is that we don’t know what inputs they use for their models.


5th most used language based on private repos that the group making the report has the exclusive direct access to seeing

I don't see that contradicting your assumption


"In this year’s Octoverse report, we study how public and open source activity on GitHub..."


I'm no ruby expert, so forgive my ignorance, but it looks like a small "NER model" packaged as a string convince wrapper named `filter` that tries to filter out "sensitive info" on input strings.

I assume the NER model is small enough to run on CPU at less than 1s~ per pass at the trade off of storage per instance (1s is fast enough in dev, in prod with long convos - that's a lot of inference time), generally a neat idea though.

Couple questions;

- NER doesn't often perform well in different domains, how accurate is the model?

- How do you actually allocate compute/storage for inferring on the NER model?

- Are you batching these `filter` calls or is it just sequential 1 by 1 calls


> - NER doesn't often perform well in different domains, how accurate is the model?

https://github.com/mit-nlp/MITIE/wiki/Evaluation

The page was last updated nearly 10 years ago.


RAG is taking a bunch of docs, chunking them it to text blocks of a certain length (how best todo this up for debate), creating a search API that takes query (like a google search) and compares it to the document chunks (very much how your describing). Take the returned chunks, ignore the score from vector search, feed those chunks into a re-ranker with the original query (this step is important vector search mostly sucks), filter those re-ranked for the top 1/2 results and then format a prompt like;

The user ask 'long query', we fetched some docs (see below), answer the query based on the docs (reference the docs if u feel like it)

Doc1.pdf - Chunk N Eat cheese

Doc2.pdf- Chunk Y Dont eat cheese

You then expose the search API as a "tool" for the LLM to call, slightly reformatting the prompt above into a multi turn convo, and suddenly you're in ze money.

But once your users are happy with those results they'll want something dumb like the latest football scores, then you need a web tool - and then it never ends.

To be fair though, its pretty powerful once you've got in place.


Or you find your users search for id strings like k1231o to find ref docs and end up needing key word search and reranking.


Sorry for my lack of knowledge, but I've been wondering what if you ask a question to the RAG, where the answer to the question is not close in embedding space to the embedded question? Will that not limit the quality of the result? Or how does a RAG handle that? I guess maybe the multi-turn convo you mentioned helps in this regard?

The way I see RAG is it's basically some sort of semantic search, where the query needs to be similar to whatever you are searching for in the embedding space order to get good results.


I think the trick is called "query expansion". You use an LLM to rewrite the query into a more verbose form, which can also include text from the chat context, and then you use that as the basis for the RAG lookup. Basically you use an LLM to give the RAG a better chance of having the query be similar to the resources.


Thanks for the answer! I think you are right, I've also heard of HYDE (Hypothetical answer generation), that makes an LLM encode a guess as the answer into the query, which may also improve the results.


Is RAG how I would process my 20+ year old bug list for a piece of software I work on?

I've been thinking about this because it would be nice to have a fuzzier search.


Yes and no, for human search - its kinda neat, you might find some duplicates, or some nearby neighbour bugs that help you solve a whole class of issues.

But the cool kids? They'd do something worse;

They'd define some complicated agentic setup that cloned your code base into containers firewalled off from the world, give prompts like;

Your expert software dev in MY_FAVE_LANG, here's a bug description 'LONG BUG DESCRIPTION' explore the code and write a solution. Here's some tools (read_file, write_file, ETC)

You'd then spawn as many of these as you can, per task, and have them all generate pull requests for the tasks. Review them with an LLM, then manually and accept PR's you wanted. Now your in the ultra money.

You'd use RAG to guide an untuned LLM on your code base for styles and how to write code. You'd write docs like "how to write an API, how to write a DB migration, ETC" and give that as tool to the agents writing the code.

With time and effort, you can write agents to be specific to your code base through fine tuning, but who's got that kind of money?


You'd be surprised how many people are actually doing this exact kind of solutioning.

It's also not that costly to do if you think about the problem correctly

If you continue down the brute forcing route you can do mischievous things like sign up for thousands and thousands of free accounts across numerous network connections to LLM APIs and plug away


I feel called out, lmao. I’m building an agentic framework for automated pentesting as part of an internal AppSec R&D initiative. My company’s letting me run wild with infrastructure and Bedrock usage (bless their optimism). I’ve been throwing together some admittedly questionable prototypes to see what sticks.

The setup is pretty basic: S3 for docs and code base, pgvector on RDS for embeddings, Claude/Titan for retrieval and reasoning. It works in the sense that data flows through and responses come out… but the agents themselves are kind of a mess.

They think they’ve found a bug, usually something like a permissive IAM policy or a questionable API call, and just latch onto it. They tunnel hard, write up something that sounds plausible, and stop there. No lateral exploration, no attempt to validate anything in a dev environment despite having MCP tools to access internal resources, and definitely no real exploitation logic.

I’ve tried giving them tools like CodeQL, semgrep and Joern, but that’s been pretty disappointing. They can run basic queries, but all they surface are noisy false positives, and they can’t reason their way out of why it might be a false positive early on. There’s no actual taint analysis or path tracing, just surface-level matching and overconfident summaries. I feel like I’m duct-taping GPT-4 to a security scanner and hoping for insight.

I’ve experimented with splitting agents into roles (finder, validator, PoC author, code auditor, super uber hacker man), giving them memory, injecting skepticism, etc., but it still feels like I’m missing something fundamental.

If cost isn’t an issue, how would you structure this differently? How do you actually get agents to do persistent, skeptical, multi-stage analysis, especially in security contexts where you need depth and proof, not just plausible-sounding guesses and long ass reports on false positives?


Seems like you need a way to dictate structured workflows, in lieu of actually being able to train them up as soc analyst. Sounds like a fun problem!


You could try just exporting it as one text or XML file and seeing if it fits in Genini's context


I don't think it will. Gemini Pro has a context window of 2 million tokens which they say translates to around 1.5 million words. We have on the order of 100,000 logged issues and a typical issue description is around 500 words.


You bought a house that had a murder X years ago and are wondering if your guilty for the murder, probably not - aslong as you don't do more murder in it.

I suppose real life is more interesting though, the guy who picked up the domain to stop the global ransomware crisis was picked up after Defcon if memory serves.

Ironically your probably at more risk from the GDPR for leaking those IP addresses that connected to the box via your blog post.

I'm not a lawyer/solicitor though, don't take my advise.


the guy (marcus hutchins) wasn't arrested for registering that domain, he was arrested for allegedly creating an unrelated piece of malware.


I think it's more like you buy an abandoned house where people used to go buy drugs

you buy the house and people are still coming knocking on your door asking you if you have any drugs to sell

you're not doing anything wrong, but if the police notice people constantly coming to your house to buy drugs they may do something about it


Other perspective: It's more like you reopen a public place where people were known to publicly harm copyright owners and you provide technical help so they can do it again.


> I suppose real life is more interesting though, the guy who picked up the domain to stop the global ransomware crisis was picked up after Defcon if memory serves

That dude developed and sold banking malware, that's why he got arrested.


This guy didn't just buy the haunted house that previously had signs directing serial killers to where the victims are, he also reinstalled the signs and opened it back up to the public knowing that the serial killers were still around and reading the signs.

I mean, it's a bit absurd to compare copyright infringement to murder, but that's where your analogy started. He didn't just by the domain and do something innocent, he actually started running the software that helps people pirate things strongly suspecting that pirates would use it to help them pirate things... and then when he observed that was reality he (smartly IMO) shut it down.


Attempting to train this on a real workload I converted over the weekend after, "step" 8M~ so far and rarely scores above 5% and most are 0% but has scored 60% once 7M~ steps ago.

Adding more than 1 GPU didn't improve speed but that's pretty standard as we don't have fancy interconnect. Bit annoying they didn't use tensorboard for logging, but overall seems like a pretty cool lib - will leave it a few days and see if it can learn (no other algo has so I dont have much hope).


> I'm interested to know if anyone is using fine-tuning to train a model on proprietary or in-house codebases and documentation.

I've done it, 1/2 the team though it was great 20% of the time, 1/2 the team hated it from day 0. I used roughly 500K lines of code.

> How much effort is required to turn code into something one can use for fine-tuning?

Very little to moderate, less than 200 lines of python, QWEM FIM, HF, LLAMA.CPP, LLAMA.CPP code extension.

> RAG solutions seem to have their limitations, and fine-tuning might be a more effective approach.

The only problem either way is keeping the information up to date, RAG just adds more cost to the inference process (which at my dev speed is pretty important).

> How much effort is required to turn code into something one can use for fine-tuning?

Fine tuning "fill in the middle" process is the process of taking a file, cutting out a some text in the middle and asking AI to guess what was there - there is a hugging face example that will have you doing it in an hour or less - your OPs team saying "No you cant litreally copy all code to a single folder" is probably the biggest hurdle (advise them you'll do it in CI and then they can stand up a FIM training endpoint that accepts a csv, pretty easy)


Oh fill in the middle is definitely smart especially for codebases!!


Love unsloth btw, use it for some other stuff at work, GRPO stuff was fun :)

I know its coming but "mUlTi GpU PlZ" :pleading: <3


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: