If somebody gives me a ThreadRipper I promised to do some performance investigations. I keep meaning to update my fractal program (Fractal eXtreme) to support more than 64 threads and that would give me a good excuse (processor group support is needed).
I'm really curious about the machine it's running on. 896 physical cores is an odd number - 32 x 28, 16 x 56 or 8 x 112 are the likely combinations. The picture identifies it as a Xeon Platinum 8180 which is a 28C/56T CPU. Are there systems that support 32 Intel CPUs in one host? I thought quad socket was the practical limit these days.
Thread affinities are tied to 64-bit numbers. So processor virtual cores are lumped into groups of at most 64. Threads can only be assigned to a single group at a time, and by default all threads in a process are locked to one group.
Well, each thread being only able to be scheduled on some of 64 cores hasn't been a huge issue so far. Usually you want your threads to stay in the same NUMA region anyways because cross socket communication is expensive.
Annoying if a processor group spans two NUMA regions leaving just a few processors to other side...
Well, I wouldn't advocate using Windows in such a setting...
I/O layer overhead in Windows is considerable. As any Windows kernel driver developer knows, passing IRPs (I/O request packets) through a long stack does not come for free. Not just drivers for filesystems, networking stacks, etc. and devices, but there are usually also filter drivers. IRPs go through the stack and then bubble back up.
Starting threads and especially processes is also rather sluggish. As is opening files.
There's no completely unified I/O API in Windows. You can't consider SOCKETs (partially userland objects!) as I/O subsystem HANDLEs in all scenarios and anonymous pipes for process stdin/stdout redirection are always synchronous (no "overlapped I/O" or IOCP possible).
For compute Windows is fine, all this overhead doesn't matter much. But I don't understand why some insist using Windows as a server.
But when someone pays me for making Windows dance, so be it. :-) You can usually work around most issues with some creativity and a lot of working hours.
The arguments I’ve heard are IO completion ports are less “brain dead” then epoll or select/poll and visual studio is a great IDE. Otherwise I’m not sure either.
IOCP is great, just annoying you can't use it with process stdin/stdout/stderr HANDLEs, at least if you do things "by the book". Thick driver sandwiches in Windows... not so great.
Visual Studio a great IDE... well, the debugger isn't amazing unless you're dealing with a self-contained bug (often find myself using windbg instead). Basic operations (typing, moving caret, changing tab, find, etc.) are slow at least on my Xeon gold workstation.
> I remember Linus playing around with a Xeon Phi CPU with a few hundred threads. The task manager was all percent signs.
The last Sun chips (Niagara / Ultrasparc Tx) also had pretty high count, IIRC they had 64 threads / socket, and were of course multi-socket systems. At 1.6GHz they were clocked pretty low for 2007 though.
https://youtu.be/fBxtS9BpVWs?t=200
Looks like Microsoft has already got 1000+ cores on Windows: https://techcommunity.microsoft.com/t5/Windows-Kernel-Intern...
Can we get Bruce Dawson[0] one of those? I wonder how many more bugs he'll run into.
[0] https://randomascii.wordpress.com/