The answer to your question is yes. There is an open issue with llama.cpp about ...

The answer to your question is yes. There is an open issue with llama.cpp about this very thing:

https://github.com/ggerganov/llama.cpp/issues/11333

The TLDR is that llama.cpp’s NUMA support is suboptimal, which is hurting performance versus what it should be on this machine. A single socket version likely would perform better until it is fixed. After it is fixed, a dual socket machine would likely run at the same speed as a single socket machine.

If someone implemented a GEMV that scales with NUMA nodes (i.e. PBLAS, but for the data types used in inference), it might be possible to get higher performance from a dual socket machine than we get from a single socket machine.