Yeh I realised afterwards that at these high clock speeds maybe they do need some extra cycles to do the correction. I don't see why you must do it in hardware to avoid timing side-channels. You just need to provide a constant latency i.e. The cpu ucode does some nops when correction is not necessary.