santaboom's comments

santaboom · 2026-02-13T01:38:05 1770946685

Yes, bandwidth within a chip is much higher than on a bus.

santaboom · 2025-08-23T05:46:26 1755927986

Lol get out of the echo chamber

Edit: to make this helpful, look at Broadcomm interconnect, switching technology, copackaged optics

santaboom · 2025-07-26T10:53:13 1753527193

Perhaps instead you are referring to the program graphs and whether they are better represented as hyper graphs [0] with certain regular edges within a 2D layer and hyper edges between 2D layers? If so, good question and I am not sure if the answer, I agree that it would probably be a helpful abstraction layer to assist with certain data locality and reduce data movement + congestion issues. [0] https://lasttheory.com/article/what-is-a-hypergraph-in-wolfr...

santaboom · 2025-07-26T10:44:56 1753526696

Curios what you mean by 3D and also how 3D is used. Assuming something like [0] I can pretty confidently tell you that it is not 3D as it is a low power microcontroller and this technology is mostly used in large expensive HPC/AI chips also afaik 3D stacking of logic die is not really a thing (if anyone knows counterexamples pls provide) it is much more common for stacking memory die ex HBM. As for your proposed coprocessor that actually might benefit from 3D integration with the trad cpu as >> # of interconnect / memory channels could allow you to route data more directly to processing elements. Something like this is proposed with “2.5D” stacking in [1] where HBM (3D) is connected to an FPGA with 128 channels.

[0] https://resources.pcb.cadence.com/blog/2023-2-5d-vs-3d-packa...

[1] page 6: https://bu-icsg.github.io/publications/2024/fhe_parallelized...

santaboom · 2025-07-26T10:14:17 1753524857

See amdahls law Edit: [0] https://en.m.wikipedia.org/wiki/Amdahl%27s_law

LargoLasskhyfv · 2025-07-27T05:00:36 1753592436

See 'Apple-Core', Microgrids, SVP, UTLEON3

In theory, in practice this seems to have failed, but still...

Also related: https://www.microsoft.com/en-us/research/project/emips/

actionfromafar · 2025-07-26T20:31:17 1753561877

Yes but we are also talking about energy efficiency

santaboom · 2025-07-27T00:33:26 1753576406

See figure 1 in [0] operation execution is just 0.1pj of 70pj required for a 32 bit int addition… how do you think I was applying amdahls law? Edit: I was using amdahls law as it applies to parallel processors. However in terms of energy usage …

See amdahls laws and use your imagination [0] https://semiengineering.com/more-data-drives-focus-on-ic-ene...

santaboom · 2025-06-22T17:57:56 1750615076

All very informative, I had some quibbles.

While it is true that cheap and expensive FPGAs exist, an FPGA system to replace TPU would not use a $0.50 or even $100 FPGA it would use a Versal or Ultrascale+ FPGA that costs thousands, compared to the (rough guess) $100/die you might spend for largest chip on most advanced process. Furthermore, overhead of FPGA means every single one my support a few million logic gates (maybe 2-5x if you use hardened blocks), compare to billions of transistors on largest chips in most advanced node —> cost per chip to buy is much much higher.

To the second point, afaik, leading edge Versal FPGAs are in 7nm, not ancient also not cutting edge used for asic(n3).

santaboom · 2025-06-22T17:42:02 1750614122

Good questions, below I attempt to respond to each point then wrap it up. TLDR: even if TPU is good (and it is good for Google) it wouldn’t be “almost as good a business as every other part of their company” because the value add isn’t FROM Google in the form of a good chip design(TPU). Instead the value add is TO Google in form of specific compute (ergo) that is cheap and fast FROM relatively simple ASICs(TPU chip) stitched together into massively complex systems (TPU super pods).

If interesting in further details:

1) TPUs are a serious competitor to Nvidia chips for Google’s needs, per the article they are not nearly as flexible as a GPU (dependence on precompiled workloads, high usage of PEs in systolic array). Thus for broad ML market usage, they may not be competitive with Nvidia gpu/rack/clusters.

2)chip makers with the best chips are not valued at 1-3.5T, per other comments to OC only Nvidia and Broadcomm are worth this much. These are not just “chip makers”, they are (the best) “system makers” driving designs for chips and interconnect required to go from a diced piece of silicon to a data center consuming MWs. This part is much harder, this is why Google (who design TPU) still has to work with Broadcomm to integrate their solution. Indeed every hyperscalar is designing chips and software for their needs, but every hyperscalar works with companies like Broadcomm or Marvell to actually create a complete competitive system. Side note, Marvell has deals with Amazon, Microsoft and Meta to mostly design these systems they are worth “only” 66B. So, you can’t just design chips to be valuable, you have to design systems. The complete systems have to be the best, wanted by everyone (Nvidia, Broadcomm) in order to be in Ts, otherwise you’re in Bs(Marvell).

4. I see two problems with selling TPU, customers and margins. If you want to sell someone a product, it needs to match their use, currently the use only matches Google’s needs so who are the customers? Maybe you want to capture hyperscalars / big AI labs, their use case is likely similar to google. If so, margins would have to be thin, otherwise they just work directly with Broadcomm/Marvell(and they all do). If Google wants everyone using cuda /Nvidia as a customer then you massively change the purpose of TPU and even Google.

To wrap up, even if TPU is good (and it is good for Google) it wouldn’t be “almost as good a business as every other part of their company” because the value add isn’t FROM Google in the form of a good chip design(TPU). Instead the value add is TO Google in form of specific compute (ergo) that is cheap and fast FROM relatively simple ASICs(TPU chip) stitched together into massively complex systems (TPU super pods).

Sorry that got a bit long winded, hope it’s helpful!

throwaway31131 · 2025-06-22T22:09:12 1750630152

This also all assumes that there is excess foundry capacity in the world for Google to expand into, which is not obvious. One would need exceptionally good operations to compete here and that has never been Google's forte.

https://www.tomshardware.com/tech-industry/artificial-intell...

"Nvidia to consume 77% of wafers used for AI processors in 2025: Report...AWS, AMD, and Google lose wafer share."

santaboom · 2025-06-22T16:49:06 1750610946

Not sure what you mean. Who do you think fabs broadcomm and google chips

lftl · 2025-06-22T18:00:59 1750615259

Ah, I didn't realize broadcomm was fabless and only helping in design.