> you are ~95% away from just running it on the GPU anyway
Vector code with lots of branches absolutely exists. You can run it on a GPU, but because they don't dedicate transistors to OoO, branch prediction, and good prefetchers, the code won't run very well.
Vector code with lots of branches absolutely exists. You can run it on a GPU, but because they don't dedicate transistors to OoO, branch prediction, and good prefetchers, the code won't run very well.