Fascinating idea that LLM performance might improve simply by changing the inference path through existing layers rather than retraining weights. It’s interesting to think of transformer stacks developing something like functional “circuits” similar to brain regions.
Interesting perspective from LeCun. The debate between scaling LLMs versus building systems that understand the physical world seems like one of the big open questions in AI right now. It will be fascinating to see whether “world models” end up complementing LLMs or eventually replacing parts of them.