> few reasons to use AVX-512 on a CPU, Memcpy and memset are massively parallel ...

stephencanon · on Nov 28, 2022

Icelake and later CPUs have a REP MOVS / REP STOS implementation that is generally optimal for memcpy and memset, so there’s no reason to use AVX512 for those except in very specific cases.

dragontamer · on Nov 28, 2022

Does AMD support enhanced REP MOVS?

I know when I use GCC to compile with AVX512 flags, it seems to output memcpy as AVX registers / ZMMs and stuff...

Auto vectorization usually sucks for most code. But very simple setting of structures / memcpy / memset like code is ideal for AVX512. It's a pretty common use case (think a C++ vector<SomeClass> where the default constructor sets the 128 byte structure to some defaults)

stephencanon · on Nov 29, 2022

AVX512 doesn't itself imply Icelake+; the actual feature is FSRM (fast short rep movs), which is distinct from AVX512. In particular, Skylake Xeon and Cannon Lake, Cascade Lake, and Cooper Lake all have AVX512 but not FSRM, but my expectation is that all future architectures will have support, so I would expect memcpy and memset implementations tuned for Icelake and onwards to take advantage of it.