Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No RAM. Instead of having a general purpose multiplier that multiplies an input with a weight stored in RAM, just have a multiplier that hardcodes the weight. In some sense replace each weight with a specialized multiplier and wire them together with accumulators and activation functions in between. And some registers for pipelining. If one goes for four bit quantization, one could have sixteen optimized multipliers, one for each possible weight, and the one just selects and connects them according to the model weights and structure.

Example. If you have a neuron with 16 inputs each 8 bit wide and with a 4 bit weight per input, you will have 16 specialized multipliers each scaling its input by the corresponding weight and then the 16 scaled inputs feed into an adder tree and finally an activation function.

 help



That sounds like wiring the RAM information into order of magnitude same number of transistors. A modern CPU has (quick googling) 184B transistors. If they were bits then that's 23GB. But presumably a model bit needs more than one transistor to represent how it acts as a neuron with its interactions.

Then there's the current speedup in inference from restricting which subset of the model is used, which is not a "swap in" that would work with hard wired neurons.

But I dunno. Maybe. I'm just guessing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: