They use the ECI - EpochAI Capability Index. Measured by the DCI the Chinese AI ...

wiseowise · 2026-01-08T18:42:48 1767897768

Interesting, does it mean they're optimizing for cases where China is disconnected from the internet completely?

torginus · 2026-01-08T19:03:08 1767898988

Almost all Chinese models are open weight research models.

My theory is that these models serve the purpose of being relatively easy to run/tweak for researchers, and mainly serve to demonstrate the effectiveness of new techniques in training and inference, as well as the strength of AI labs that created them.

They are not designed to be state of the art commercial models.

By choosing bigger model sizes, running more training epochs, and drilling the models a bit more on benchmarking questions, I'm sure the Chinese could close the gap, but that would delay these models, make them more expensive and harder to run without showing any tangible research benefit.

Also my 2c: I was perfectly happy with Sonnet 3.7 as of a year ago, if the Chinese have a model really as good as that (not only one that benchmarks as well), I'd definitely like to try it.

benxh · 2026-01-08T20:20:21 1767903621

It is arguable that the new Minimax M2.1 and GLM4.7 are drastically above Sonnet 3.7 in capabilities.

torginus · 2026-01-08T21:45:34 1767908734

Could you share some impressions of using them? How do they feel like compared to OAI models or Claude?

benxh · 2026-01-08T23:36:43 1767915403

Minimax has been great for super high speed web/js/ts related work. It compares in my experience to Claude Sonnet, and at times gets stuff similar to Opus. Design wise it produces some of the most beautiful AI generated page I've seen.

GLM-4.7 like a mix of Sonnet 4.5 and GPT-5 (the first version not the later ones). It has deep deep knowledge, but it's often just not as good in execution.

They're very cheap to try out, so you should see how your mileage varies.

Ofcourse for the hardest possible tasks that GPT 5.2 only approaches, they're not up to scratch. And for the hard-ish tasks in C++ for example that Opus 4.5 tackles Minimax feels closer, but just doesn't "grok" the problem space good enough.

Tiberium · 2026-01-08T20:20:53 1767903653

No, for example Alibaba has huge proprietary Qwen models, like Qwen 3 Max. You just never hear about them because that space in western LLM discussions is occupied with the US labs.

marcosdumay · 2026-01-08T18:48:55 1767898135

No, there are datacenters in China they can still use if their internet is down.

dust42 · 2026-01-08T19:39:25 1767901165

Well, my answer was partially tongue in cheek. However, under the circumstances I mentioned I had better results with a local model. But then up to today I really like Deepseek for dev - for me and my use cases it still outperforms Opus. The problem is we don't have good and universal benchmarks, inststead we have benchmaxxing. For my use cases Chinese models may be perfect but maybe for yours American one's may outperform them by a huge margin? So I urge you to use YCI - Your Capability Index.

litbear2022 · 2026-01-09T01:19:13 1767921553

Let's see who disconnected from the internet completely :)