Considering I'm someone who knows very little about hardware and almost nothing about its manufacturing process, I have a question.
I assume what this article means is that only 55% of the produced chips in this process node are considered completely working, which I suppose it means that these 55% of chips are the ones who must have passed all the tests.
So that leaves 45% of chips where at least one defect was found. But here's my question: with Apple's M2 Ultra chip having 134 billion transistors, do the tests really cover all of the non-redundant transistors, which I assume are a substantial portion of them? And also, even for those transistors that are tested, how reliable are these tests in detecting defects?
And assuming that not all transistors are tested, given that 45% is such a high failure rate, isn't there a very high chance that even those chips that passed all the tests also contain plenty of bad transistors, but they simply weren't tested (or even if they were tested, the tests may have not reliably detected a fault)?
Even considering that these chips contain some redundancy / fault tolerance, it just seems kind of amazing to me that they are as reliable as they are given all the unknown faults that they might have (and I'm only focusing on the manufacturing process, but some of the same reasoning also applies to the design).
> do the tests really cover all of the non-redundant transistors, which I assume are a substantial portion of them?
In the limit, yes! "Design for test" (DFT) tooling inserts extra logic beneath the RTL netlist, which allows the chip to be put in a test mode where each flop can be driven or read as a giant shift register. Automated tooling designs test patterns to clock into these registers such that every gate in the design is verified. For large chips test patterns may be multiple gigabytes (though highly compressible).
Of course, this is "in the limit" and frequently (density, routing, frequency, etc) there are reasons where DFT engineers would choose to omit DFT inputs. In those cases more involved tests are needed to exhaustively verify those regions.
Yield % in Samsung and TSMC are pointing to different things. And knowing Samsung and TSMC, the first may have overstated while the latter may give you a conservative number.
at the end of the day Apple (and thus we as customers) always pay.
Doesn't matter if Apple pays only for 'good' chips (and hence TSMC will increase overall prices) or if Apple pays for all chips (and throws away a bunch, hence increasing overall prices). TSMC won't just eat up the cost completely either and wafer prices have been going up steadily.
Sometimes it makes sense to pay even failed things because your overall cost is lower.
Say hypothetically that TSMC said $30 per working chip or $8 per chip and $2 to test. Taking the later choice would result in lower overall costs at current yields.
Typically for this kind of work the manufacturer wants their yield improvements to lead to reduced cost to them so only paying for what works is standard to my knowledge.
In contrast it is bespoke work that sometimes shifts, where there is no possibility for things like down binning to recover failures (disabling parts of a chip and selling for less)
It’s just a calculation, surely TSMC knows about these rates ahead of time. In my opinion it’s just a way for TSMC to give themselves a big bonus if they manage to improve yields.
I assume what this article means is that only 55% of the produced chips in this process node are considered completely working, which I suppose it means that these 55% of chips are the ones who must have passed all the tests.
So that leaves 45% of chips where at least one defect was found. But here's my question: with Apple's M2 Ultra chip having 134 billion transistors, do the tests really cover all of the non-redundant transistors, which I assume are a substantial portion of them? And also, even for those transistors that are tested, how reliable are these tests in detecting defects?
And assuming that not all transistors are tested, given that 45% is such a high failure rate, isn't there a very high chance that even those chips that passed all the tests also contain plenty of bad transistors, but they simply weren't tested (or even if they were tested, the tests may have not reliably detected a fault)?
Even considering that these chips contain some redundancy / fault tolerance, it just seems kind of amazing to me that they are as reliable as they are given all the unknown faults that they might have (and I'm only focusing on the manufacturing process, but some of the same reasoning also applies to the design).