Some of the AVX512 instructions are nice additions though, even if you only use it on YMM or XMM registers. All the masked instructions make it much simpler to vectorize loops over variable amounts of data without special treatment for the head/tail of the data.
This is exactly right—-the 512b width is the least interesting thing about AVX512. All the new instructions on 128, 256, and 512b vectors, (finally) good support for unsigned integers, static rounding modes for floating point, and predication are much more exciting.
I don't mind head/tail stuff so much but the masked instructions and vcompress make a lot of "do some branchy different stuff to different data or filter a list" easier and faster compared to AVX2, so I'm a big fan