Can you elaborate what kind of system you built? I'm curious what specific promp...

barrell · 2026-05-10T16:38:26 1778431106

Linguistics, specifically as it pertains to language learning

Edit: Whoops read your question wrong. I do a bunch of NLP on different languages, and use LLMs to pad out and interpret the data. Asking for things like translations, alternatives, transliterations; associating and validating data; transferring data from one language to another; segmentation and cross lingual alignment; the list goes on.

I did manage to get higher quality in the end, so it’s not entirely a regression. But older LLMs were much more capable with less prompting at interpreting disparate data and tying it together.

Most of the work I do does not really have a “right answer,” just a lot of wrong ones, which I think is what trips up LLMs. If I turn on reasoning for any step in my pipeline, the token count goes up 100 fold and the quality gets cut in half.

Edit 2: I did have to move off of GPT though to get the improvements mentioned. Go mistral!

gaflo · 2026-05-11T08:09:45 1778486985

What kind of data are you interpreting? Do you mean document extraction from different languages? I have only used GPT5.5 for agentic coding, which did get significantly better from my experience, although that does align with your conjecture of their focus being on improving this. I haven't noticed a regression when it comes to interacting with it in different languages though (specifically German and Russian). I have done data extraction from documents in different languages, but only with locally hosted LLMs (mainly Qwen3.5-397b) as I cannot legally use cloud-based solutions. My local solution was more than sufficient, so I would be surprised if a frontier model would fail at that.