I'm speaking more in terms of the goals the markup dialects had, irrespective of the ultimate implementation. I think we can all agree that those suffered from misguided engineering choices (bloaty XML culture).
Responsive images could have been an XHTML module with a javascript implementation. The browser vendors could catch up and provide native implementations in their own time, but that would not postpone immediate usage.
If it were done right, anyone could have defined a markup module/schema with parsing rules and scripting. The evolution of those extensions would have been pretty damned fast due to forking, quick vetting/optimization, etc. It would have been well timed with the recent javascript renaissance, if it had happened. It might have meant browser vendor independence at the level of the developer.
HTML should really have been modular with an efficient, lightweight core spec. It should have also paid lots of attention to being semantic so that others could compete with Google on search. I am still curious if that's why Google got involved in the WHATWG. I'm rambling about things I don't know about though...
> Responsive images could have been an XHTML module with a javascript implementation. The browser vendors could catch up and provide native implementations in their own time, but that would not postpone immediate usage.
This is exactly what happened, except without the XHTML nonsense. JavaScript polyfills of the picture element were created and in use before native implementations eventually caught up. (And native implementations are very necessary, in this case, because they need to hook in to the preload scanner, which is not JS-exposed.)
More generally, custom elements and extensible web principles in general enable all of this. Again, without XML being involved.