OpenAI and Anthropic's real moat is hardware. For local LLMs, context length and hardware performance are the limiting factors. Qwen3 4B with a 32,768 context window is great. Until it begins filling up and performance drops quickly.
I use local models when possible. MCPs work well, but their large context injection makes switching to an online provider the no-brainer.
I use local models when possible. MCPs work well, but their large context injection makes switching to an online provider the no-brainer.