Is that generally true to be faster rather than having dedicated RAM alongside the ASIC? Or are the unit economics not worth it and generally unified memory systems is the current dominating design?
Input is way smaller than output, so memory performance considerations there probably don't even register in the larger scale of things. (having to write the decompressed frame to RAM and read it again to scan it out to display)
Say 4-8kiB per frame on input leads to 4MiB frame on output.
I admit I'm not familiar with h246, I thought the motion vectors and such was applied on the decompressed reference image. At least that's how we implemented the psedudo-MPEG1 encoder/decoder in class.
Not having to decompress the reference frame for every decoded frame seems like a win.