Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use intrinsics by hand all the time. It's very easy to make a problem too complicated to autovectorize. And even if you do get it to autovectorizie, it's not exactly future proof against compiler changes.


I have definitely done the same.

BTW I really like this library https://github.com/vectorclass/ I wish I had a similar thing for Rust. And for AArch64/NEON.


I use Agner as well. I started up my own version for Rust specifically targeting avx512[1], but I've been hitting enough snags to where I think I'll abandon it. It's super green at the moment, and I haven't pushed it to Cargo. But if I'm going to dedicate time to it, then I need it to work for my purposes, and there's a thread-parallel problem that makes this unusable for me at the moment.

[1] https://github.com/matthewkolbe/lit_math


Note that simply using a SIMD vector class library does not make it "go faster". In fact it can make things worse (due to latency). What you usually need is a problem and then a solution (algorithm) that parallelizes well.


Explain? My experience has been that you don't need much. I've benchmarked Agner's exp, and if you have 4 calculations to do, then calling it with avx2 will be 4x faster than calling std::exp four times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: