> LLM's know when they get information from certain places Nah, while the compan...

> LLM's know when they get information from certain places

Nah, while the companies running scrapers may log what their regular programs scraped for training data, the LLM, the document-get-bigger algorithm, isn't that type of logical system. It isn't made to have a reliable concept of fact-attribution. It can't even track which parts of an ongoing "conversation" document are supposed to be "itself" as a self-insert character, which is why prompt injection has been a recurring intractable problem.

The LLM will emit text of "X is true, and I got that fact from Y" with the same rigor that it emits text like "I am Sherlock Holmes, and I know Santa Claus was murdered by Dracula via the following deductions..."

If the LLM is used as a adapter/frontend to a search-engine, helping to craft queries, then I suppose the not-an-LLM parts of the system would "know" the what results they're serving up. However the moment you try to "summarize" the mix of all top 10 results, we're back into unreliable stochastic-bullshitting territory again.