It should actually be used as a justification to persuade: you can usually write x86 assembly in any way that comes to mind and as long as data access (load/store) is the same across these variants the CPU will take the same number of cycles to execute them due to the crazy amount of optimization in the modern architectures.
Unless I'm misunderstanding what you are saying, this is not true. Instruction selection and data dependencies play a big role in the performance of a routine.
What is indeed true is that the x86(-64) ISA alone does not give you enough information to predict performance accurately. Furthermore, due to the interactions between the different subsystems (e.g, caches, OoO buffers, etc) it is essentially impossible to determine performance with cycle-accuracy. But it is still possible to have a first-order approximation, for a given microarchitecture, of what is going on underneath the CISC mask.