So this one is 3x the size but only 7% better on MMLU? Given Moores law is mostly dead, this trend is going to make for even more extremely expensive compute for next gen AI models.
True, I was in too distracting an environment to do that calculation, but it still feels like its a logarithmic return on extra compute. How long before the oceans start to boil? (figuratively that is).