Isn't it quite possible they replaced that Flash model with a distilled version, saving money rather than increasing quality? This just speaks to the value of open-weights more than anything.
I checked this, the whole conversation was about 1000 tokens.
I suspect the Ollama version might have wrong default settings, such as conversation delimiters. The experience of Gemma 3 in AI studio is completely different.
I thought my over-the-top response made it clear it was satire and a comment on the absurdity of someone thinking they're "actually friends with Brad Pitt and he needs my money"
Good point, it’s already practically impossible to distinguish between ChatGPT o1 output and say the median substack essay. At least for someone who only has a few minutes to spare for second guessing.
https://docs.rs/unicode-segmentation/latest/unicode_segmenta...