Except that would not show lower energy usage. To preserve backward compatibility, the cores would need to keep the logic that analyses data dependencies to continue delivering good performance to legacy code. To make any difference they would need to both do what you say, and also define some protocol to instructs the processor to disable the dataflow analysis unit entirely (to save energy). But that protocol would be invasive, because you need to re-activate the unit at the first instruction that is not annotated, and upon faults, branches, etc. The logic to coordinate this protocol becomes a new energy expenditure on its own!
Really the way forward would be to extend x86 completely, with a "mode" where all instructions are annotated and go through a different pipeline front-end than legacy x86 code. But Intel already tried that with IA64, and it burned them very hard. I am not sure they are willing to do it again.
IA64 was an entirely new instruction set on an entirely new architecture; having nothing whatever to do with x86. Compatibility for x86 was added later to try and improve sales.
AMD64 / x64 pretty much hops into a different mode and goes on executing from there. Given how many modes and instructions these chips support, I don't see why adding another would easily upset people.
Yes you are right, if that was marketed like the introduction of x64 it may work.
However there is a big difference: the move to 64-bit words was something that was in demand when it was introduced. There was a market for it, with a very clear value proposal.
In contrast a new "mode" with the same computational power plus dataflow annotations would be a tough sell: larger code size, and and better performance / watt only for some applications.
(Also, as far as I know AMD64 / x64 on Intel cores uses the same decode and issue unit, just with different microcode. Circuit-wise there is a lot in common with x86.
Here we would be talking about a new mode and also a new instruction+data path in the processor. The core would become larger and more expensive. Not sure how that plays.)
Is there something about dataflow annotated instructions that would require lots of internal changes? I mean past the decoding step. Because Intel and AMD add new instructions and modes all the time, frequently stuff that is basically totally orthogonal to whatever's gone before.
Really the way forward would be to extend x86 completely, with a "mode" where all instructions are annotated and go through a different pipeline front-end than legacy x86 code. But Intel already tried that with IA64, and it burned them very hard. I am not sure they are willing to do it again.