Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can get to most of the special purpose x86 vector instructions in GCC without dropping down to assembly.

http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc/Vector-Extension...

http://gcc.gnu.org/onlinedocs/gcc-4.6.0/gcc/X86-Built_002din...



Vector extensions are terrible: they're too high-level an abstraction because they don't expose individual instructions, and SIMD instructions are "weird" enough that you can't get reasonable performance by expecting the compiler to magically do the right thing. There's no gcc vector equivalent of psadbw, for example, and I doubt the compiler is even capable of generating it.

Intrinsics are passable, but still have two core problems. First of all, the compiler is atrocious at register allocation: it will often spill far more than necessary, and even when it doesn't, it will usually spill the wrong things. Secondly, they're way harder to write than straight assembly, far less flexible, and are nigh-unreadable, so why bother?


I'm not sure you looked at what I posted. __builtin_ia32_psadbw is right there on the list of builtins. I've used __builtin_ia32_psadbw128 in GCC myself. It compiles directly to PSADW instructions. Perhaps you confused what I was talking about with GCC's auto-vectorization?

edit: Just realized that you're the x264 guy and it's unlikely you misunderstood me. Still I think my point about psadbw stands.


I'm not sure you looked at what I posted. __builtin_ia32_psadbw is right there on the list of builtins. I've used __builtin_ia32_psadbw128 in GCC myself. It compiles directly to PSADW instructions. Perhaps you confused what I was talking about with GCC's auto-vectorization?

Those aren't gcc vectors, those are intrinsics. Vectors use something like this:

    typedef uint32_t v4si __attribute__((vector_size (16)));
    v4si v16 = {v,v,v,v};
__builtin functions that act on __m128 values are separate from "GCC vector instructions".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: