Yeah but are we all just speculating or is it accepted knowledge that this is ac...

sally_glance · 2026-01-20T08:01:43 1768896103

Speculation I think, because for one those supposed proxy providers would have to provide some kind of pricing advantage compared to the original provider. Maybe I missed them but where are the X0% cheaper SOTA model proxies?

Number two I'm not sure if random samples collected over even a moderately large number of users does make a great base of training examples for distillation. I would expect they need some more focused samples over very specific areas to achieve good results.

skrebbel · 2026-01-20T10:36:01 1768905361

Thanks I that case my conclusion is that all the people saying that these models are "distilling SOTA models" are, by extension, also speculating. How can you distill what you don't have?

sally_glance · 2026-01-20T22:16:45 1768947405

Only way I can think of is paying for synthesizing training data using SOTA models yourself. But yeah, I'm not aware of anyone publicly sharing that they did so it's also speculation.

The economics probably work out though, collecting, cleaning and preparing original datasets is very cumbersome.

What we do know for sure is that the SOTA providers are distilling their own models, I remember reading about this at least for Gemini (Flash is distilled) and Meta.

mike_hearn · 2026-01-20T15:22:12 1768922532

OpenAI implemented ID verification for their API at some point and I think they stated that this was the reason.