Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So how many hardware systems does Apple silicon have for doing matrix multiplies now?

1. CPU, via SIMD/NEON instructions (just dot products)

2. CPU, via AMX coprocessor (entire matrix multiplies, M1-M3)

3. CPU, via SME (M4)

4. GPU, via Metal (compute shaders + simdgroup-matrix + mps matrix kernels)

4. Neural Engine via CoreML (advisory)

Apple also appears to be adding a “Neural Accelerator” to each core on the M5?



Doesn’t that make sense though as each manipulates a different layer in the memory hierarchy allowing the programmer to control the latency and throughput implications. I see it as a good thing.


Oh I’m not complaining, I appreciate having so many knobs to tweak performance


Is this really strange? Matmul is just a specialized kind of primitive compute, one that is seeing an explosion in practical uses.

A Mac Quadra in 1994 probably had floating point compute all over the place, despite the 1984 Mac having none.


Thankfully I think libraries like Pytorch abstract this stuff away. But it seems very convoluted if you're building something from the ground up.


Does PyTorch support other acceleration? I thought they just support Metal.


You can convert a PyTorch model to an ONNX model that can use CoreML (or in some cases just convert it to a CoreML model directly)


I wonder if some Apple-made software, like Final Cut, make use of all of those "duplicated" instructions at the same time for getting a better performance...

I know how just the multitasking nature of the OS probably make this situation happens across different programs, but nonetheless would be pretty cool!


Would it be possible to use all of them at the same time? Not necessarily in a practical way, but just for fun? Could different ways of doing this on CPU be done in some extent by one core at the same time, given it's superscalar?


This is a very old answer about the M1, but yes what you’re saying is possible: https://stackoverflow.com/a/67590869/230778


Apple's clearly betting big on on-device AI workflows becoming the norm


>Apple also appears to be adding a “Neural Accelerator” to each core on the M5?

The "neural accelerator" is per GPU core, and is matmul. e.g. "Tensor cores".


Adding CPUs and GPUs on top of your CPUs and GPUs... Sounds like we've the spiritual successor of the Sega Saturn.


I inferred that they meant the neural engine cores by neural accelerators or it could be a bigger/different AMX (which really should become a standard btw)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: