Hacker Newsnew | past | comments | ask | show | jobs | submit | dfhg's commentslogin

Would you kindly elaborate a little bit the difference between training on own documents vs analyzing documents for answers?


The word "training" implies creating a new model by fine-tuning an existing model on top of new documents.

As several other comments in this thread have already indicated: this is almost always the wrong direction. Which is confusing because it's the direction everyone always assumes they should go in at first.

The approaches that does work is surprisingly simple: take the user's question, search for snippets of your documents that appear to be about that question, then paste all of those snippets into the prompt along with the user's question and see what answer you get.

This is known as RAG: Retrieval Augmented Generation. It's a very powerful approach.


> take the user's question, search for snippets of your documents that appear to be about that question, then paste all of those snippets into the prompt along with the user's question and see what answer you get.

We use RAG at my job, but we don’t do any preprocessing on the message from the user, so the results are not always great for us.

Do any of you have experience using a small local model just for extracting keywords from messages which you then use for the retrieval? And then feed the search result and your prompt into OpenAI or whatever as normal.


I've been trying out an interesting embedding model that knows how to treat text as a question be as a phrase about the world, and embeds the question such that it's likely to end up close to phrases that might answer that question: https://til.simonwillison.net/llms/embed-paragraphs

Embedding and chunking large amounts of documents is expensive though, in both compute and storage.

The other trick I've been planning to explore is using an LLM to turn the user's question into a small number of normal FTS search queries and then run those to try and get context data.


> The other trick I've been planning to explore is using an LLM to turn the user's question into a small number of normal FTS search queries and then run those to try and get context data.

I have also been working on this. I still fail to see why this approach isn't the default frankly. There's little benefit to vector databases.


https://docs.llamaindex.ai/en/stable/examples/retrievers/bm2...

Also maybe try to include tags or categories when you index and then you can filter on those when doing the vector search. Might get a similar effect from BM25.

Also llamaindex does RAG better than some other solutions.


how do RAG implementations work with generic prompts vs specific prompts? meaning, there are prompts that could easily be answered by the base model itself and doesn't require RAG. but some prompts might involve questions about something proprietary where RAG is actually useful.

so is the default to just run the RAG search index on every prompt and if it returns nothing then you get the plain answer from the base model otherwise you get the augmented answer?


Another question, which one is preferred, LlamaIndex or Langchain, for RAG? Thanks in advance for your insights.


You basically don't use langchain for anything besides 30 minute demos that you copied from someone else's github. It has a completely spaghettified API, is not performant, and forces you into excessive mental contortions to reason about otherwise simple tasks.

LlamaIndex is pretty good.


Yea discovered this with Langchain last week. Was great for a demo then started to push it harder and spent ages trawling Reddit, discord, GitHub trying to find solutions to issues only to discover what was supposed to be supported was deprecated. Got a massive headache for what should have been a simple change. Moved on now.


Yeah +1

We originally started out building features with LangChain (loading chains from YAML sounded good—it felt like it would be easy to get non-engineers to help with prompt development) but in practice it’s just way too complicated. Nice idea, but the execution feels lacking.

It also doesn’t help that LangChain is evolving so rapidly. When we first started using it a lot of code samples on the internet couldn’t be copy/pasted because of import paths changing, and at one point we had to bump by ~60 patch versions to get a bug fix, which was painful because it broke all kinds of stuff


Echoing others’ sentiments, I was frustrated with the bloat and obscurity of existing tools. This led me to start building Langroid with an agent-oriented paradigm 8 months ago https://github.com/langroid/langroid We have companies using it in production for various use-cases. They especially like our RAG and multi-agent orchestration. See my other comment for details.


what's the "groid"? isn't that a slur?


language android i imagine..


You got it


If you think that's bad, you're gonna hate Scunthorpe.


that's not offensive


s'not?


s'not


LlamaIndex is mainly focused on RAG. LangChain does a ton of other stuff too. I'd focus on LlamaIndex first.


Haystack [1] is another good option. It‘s modular, doesn’t get in your way and is particularly strong at retrieval. People like the documentation too.

Disclaimer: I work at deepset

[1] https://github.com/deepset-ai/haystack


Besides the other comments in this thread, I'd really recommending looking at least first to the (relatively new) "Managed index" in LlamaIndex: https://docs.llamaindex.ai/en/stable/community/integrations/... . These handle combining the retrieval with the generative side. I've seen a lot of users both get frustrated and get bad results by trying to write their own glue to string together various components of retrieval and generation and these are much easier to get started with


Are there public examples of working products using RAG, compared with fine-tuning or training from scratch?


The OpenAI assistants API is an implementation of a RAG pipeline. It performs both RAG on any documents you upload, and on any conversation you have with it that exceeds the context.



Not public but internally I wrote a tool to help us respond to RFPs. You pass in a question from a new RFP and it outputs surprisingly great answers most of the time. Is writing 75%+ of our RFP responses now (naturally we review and adjust sometimes and as needed). And best of all it was very quickly hacked together and it’s actually useful. Copied questions/answers from all previous ones into a doc, and am using OpenAI embeddings api + FAISS vector db + GPT-4 to load the chunks + store the embeddings + process the resulting chunks.


Amazon Q is (at least partially) a RAG implementation.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: