I use intrinsics by hand all the time. It's very easy to make a problem too comp...

cmrdporcupine · on Nov 28, 2022

I have definitely done the same.

BTW I really like this library https://github.com/vectorclass/ I wish I had a similar thing for Rust. And for AArch64/NEON.

kolbe · on Nov 28, 2022

I use Agner as well. I started up my own version for Rust specifically targeting avx512[1], but I've been hitting enough snags to where I think I'll abandon it. It's super green at the moment, and I haven't pushed it to Cargo. But if I'm going to dedicate time to it, then I need it to work for my purposes, and there's a thread-parallel problem that makes this unusable for me at the moment.

[1] https://github.com/matthewkolbe/lit_math

Cloudef · on Nov 29, 2022

Note that simply using a SIMD vector class library does not make it "go faster". In fact it can make things worse (due to latency). What you usually need is a problem and then a solution (algorithm) that parallelizes well.

kolbe · on Nov 29, 2022

Explain? My experience has been that you don't need much. I've benchmarked Agner's exp, and if you have 4 calculations to do, then calling it with avx2 will be 4x faster than calling std::exp four times.