Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I remember Linus playing around with a Xeon Phi CPU with a few hundred threads. The task manager was all percent signs.

https://youtu.be/fBxtS9BpVWs?t=200

Looks like Microsoft has already got 1000+ cores on Windows: https://techcommunity.microsoft.com/t5/Windows-Kernel-Intern...

Can we get Bruce Dawson[0] one of those? I wonder how many more bugs he'll run into.

[0] https://randomascii.wordpress.com/



If somebody gives me a ThreadRipper I promised to do some performance investigations. I keep meaning to update my fractal program (Fractal eXtreme) to support more than 64 threads and that would give me a good excuse (processor group support is needed).

Anyone? Got a spare one lying around?


I would write to amd marketing. They will likely loan you a machine, or perhaps even give you one if you gave them tangible reasons.


I'm really curious about the machine it's running on. 896 physical cores is an odd number - 32 x 28, 16 x 56 or 8 x 112 are the likely combinations. The picture identifies it as a Xeon Platinum 8180 which is a 28C/56T CPU. Are there systems that support 32 Intel CPUs in one host? I thought quad socket was the practical limit these days.


Here's one, "HPE Superdome" https://h20195.www2.hpe.com/V2/getpdf.aspx/A00036491ENW.pdf

See the diagram on page 6, they have a custom routing chip to link up 8 boards over UPI.


It says right there, Xeon Phi 7210.

Knight's Landing supports 4-wide threading per core so you get 256 threads, which is exactly what it shows in task manager under "logical processors".


I was talking about the Microsoft article, not the LTT video, which are using different CPUs.

The HP 32 socket chassis (8x4 socket boards) seems to be the answer.


> Can we get Bruce Dawson[0] one of those? I wonder how many more bugs he'll run into.

Oh yeah, especially because core affinities in Windows get all wonky once you go above 64 threads.


> Oh yeah, especially because core affinities in Windows get all wonky once you go above 64 threads.

Can you elaborate? I haven't noticed any particular "wonkiness" happening?


Thread affinities are tied to 64-bit numbers. So processor virtual cores are lumped into groups of at most 64. Threads can only be assigned to a single group at a time, and by default all threads in a process are locked to one group.

https://bitsum.com/general/the-64-core-threshold-processor-g...

https://docs.microsoft.com/en-us/windows/win32/procthread/pr...


Well, each thread being only able to be scheduled on some of 64 cores hasn't been a huge issue so far. Usually you want your threads to stay in the same NUMA region anyways because cross socket communication is expensive.

Annoying if a processor group spans two NUMA regions leaving just a few processors to other side...


Huge issue for whom?

Hypervisor developers are livid about this across all cloud providers.

In practical terms the baseline of a windows OS is rather high, so if you have high density throughput of compute then you get more for your money.

IE: You pay a license cost per CPU, and you pay at minimum 1physical CPU core and 1GiB of memory per Windows machine;


Well, I wouldn't advocate using Windows in such a setting...

I/O layer overhead in Windows is considerable. As any Windows kernel driver developer knows, passing IRPs (I/O request packets) through a long stack does not come for free. Not just drivers for filesystems, networking stacks, etc. and devices, but there are usually also filter drivers. IRPs go through the stack and then bubble back up.

Starting threads and especially processes is also rather sluggish. As is opening files.

There's no completely unified I/O API in Windows. You can't consider SOCKETs (partially userland objects!) as I/O subsystem HANDLEs in all scenarios and anonymous pipes for process stdin/stdout redirection are always synchronous (no "overlapped I/O" or IOCP possible).

For compute Windows is fine, all this overhead doesn't matter much. But I don't understand why some insist using Windows as a server.

But when someone pays me for making Windows dance, so be it. :-) You can usually work around most issues with some creativity and a lot of working hours.


The arguments I’ve heard are IO completion ports are less “brain dead” then epoll or select/poll and visual studio is a great IDE. Otherwise I’m not sure either.


IOCP is great, just annoying you can't use it with process stdin/stdout/stderr HANDLEs, at least if you do things "by the book". Thick driver sandwiches in Windows... not so great.

Visual Studio a great IDE... well, the debugger isn't amazing unless you're dealing with a self-contained bug (often find myself using windbg instead). Basic operations (typing, moving caret, changing tab, find, etc.) are slow at least on my Xeon gold workstation.


> I remember Linus playing around with a Xeon Phi CPU with a few hundred threads. The task manager was all percent signs.

The last Sun chips (Niagara / Ultrasparc Tx) also had pretty high count, IIRC they had 64 threads / socket, and were of course multi-socket systems. At 1.6GHz they were clocked pretty low for 2007 though.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: