Call me cynical but I am getting very skeptical against a lot of well establishe...

Morendil · on Sept 4, 2012

"One of these isn't like the others..."

Things like the Cone, the rising-cost-of-defects or the 10x claim have been kicking around for decades.

The evidence for or against TDD is, admittedly, inconclusive, but it's more recent and of a better academic caliber. There have been a lot of studies. Most of these studies aren't any good - but at least someone is trying.

There's a deeper question, which is "granted that all the empirical evidence we have so far for claims in software engineering isn't all that good, how can we get good empirical evidence?"

I suspect that the answer is going to involve changing the very questions we ask. "Does TDD work" is too fuzzy and ill-defined, and there's no way you can test that in a blinded random experiment. People's biases about TDD (subjects' or experimenters') are going to contaminate the evidence.

Instead, we need to ask questions that aren't susceptible to this kind of bias and contamination. For instance, we might want to unobtrusively study actual programmers working on actual projects, and record what causes them to write defects.

diego_moita · on Sept 4, 2012

I fear the problems on this empirical approach are deeper.

The main problem is that software metrics are imprecise and non-objective. Lines of code, functional points, code coverage, counting code paths, ... we can't have a metric that can be accepted by everyone, all of them have big flaws. And the metrics are the basis for any reliable analysis, if we can't trust them we can't trust anything.

The second main problem is that it is very hard to isolate things under examination. How can we analyse TDD without taking into account the developer's grasp of good design (coupling and cohesion), Dependency Injection and Inversion of Control, refactoring techniques and tools, etc?

Software Engineering is a lot harder because it is much more akin to the fuzzy social studies (e.g.: economics, sociology, management) than to hard sciences (e.g.: computing science).

davidcuddeback · on Sept 4, 2012

To be fair, the metrics you mentioned are objective. It's just debatable how relevant they are. Other metrics that are used are measures of coupling, cohesion, lines per method, methods per class, etc. There are probably more measures that are used. None of them are perfect, but they all help paint a picture that adds to our understanding.

To address your second point, it's actually not too difficult to isolate factors like TDD. The standard way is to have a control group and a test group. With a large enough of a sample size, you can determine statistical significance with standard tests.

Unfortunately, the test subjects are often university students, who are less experienced than professionals. The fact that data collected on students might not generalize to professionals is a threat to external validity, but should be made explicit in most papers. Most of the time, I think companies aren't very happy about having researchers use their engineers for experiments on the company's dime, but it does happen. So there are some papers out there reporting results with professionals.

EdiX · on Sept 4, 2012

> The evidence for or against TDD is, admittedly, inconclusive, but it's more recent and of a better academic caliber. There have been a lot of studies. Most of these studies aren't any good - but at least someone is trying.

Can you point me to some of those studies? Every time I look I only find the same 2 studies everyone quotes from (and aren't very good).

Morendil · on Sept 4, 2012

Sure, you can grab my Agile bib file here: https://github.com/Morendil/referentiel.institut-agile.fr/bl...

There are 48 papers tagged with "tdd".

itmag · on Sept 4, 2012

the orders of magnitude difference in programmers productivity

I am skeptical of this too. It makes more sense to have huge swings in ability, not productivity. Ie a poor programmer won't take 10x as long to code a given feature, he will just hit a ceiling of ability and not be able to do it at all.

JabavuAdams · on Sept 4, 2012

Consider the difference between knowing exactly which library to use to solve a particular problem vs. believing that you need to write new code to do it. That can easily account for a 10x productivity difference.

RHSeeger · on Sept 4, 2012

I'm inclined to believe it's less about the time required to code specific feature... and more about the time required to

1. Figure out what features should be implemented (ie, will implementing this feature shoot us in the foot later)

2. Figure out the correct implementation

3. Be able to handle future feature requests

Sure, 1 & 2 will vary by skill/experience. However, the skill/experience with which 1 & 2 are handled can severely impact 3, causing it to easily take 10x longer if it can be done at all. As you move onto 4 and down the line, this becomes more and more pronounced.

robomartin · on Sept 4, 2012

>Ie a poor programmer won't take 10x as long to code a given feature, he will just a ceiling of ability and not be able to do it at all.

10x longer is not a poor programmer, that's an incompetent programmer. If someone needs ten days to code something that can be done in one day something is very seriously wrong.

sirclueless · on Sept 4, 2012

Are you surprised by that? The 10x longer statement doesn't surprise me at all. If you ask someone, "Please implement a Java class that has the following methods and behaves like this" you might expect any competent programmer to finish within ~3x of each other.

But given some more nebulous task, where architectural decisions must be made and serious research and testing needs to be done, it's not surprising at all. For example, if you asked someone, "Please write me a library so that I can send and receive XMPP messages," I would expect a large number of otherwise competent programmers to make a significant number of false starts and poor decisions, and generally take much more time than the guy who has experience writing libraries and interpreting text protocols. For example, consider the case of Ron Jeffries and Peter Norvig writing a sudoku solver[1] (this example is a perennial favorite of mine in all sorts of discussions).

And I don't think anything is "very seriously wrong" with this situation. Different skillsets and competence levels result in drastically different results. I think this is true of any profession that is largely about creative problem-solving: some can do it efficiently, some cannot. Programming is just a unique case because there are so many people trying it and not being deterred due to poor performance because there is such a demand.

[1] http://ravimohan.blogspot.com/2007/04/learning-from-sudoku-s...

shawabawa3 · on Sept 4, 2012

> 10x longer is not a poor programmer, that's an incompetent programmer. If someone needs ten days to code something that can be done in one day something is very seriously wrong.

That's just not true, it's all relative. Linus Torvalds supposedly coded git to the point where it was self-hosted in 1 day. Even a very good programmer could take more than 10 days to do that, an average (but not incompetent) programmer could take months.

jkubicek · on Sept 4, 2012

Linus has guessed it took him about two weeks to get git to the point it was self-hosted.

http://www.spinics.net/lists/git/msg24132.html

It's still impressive, though.

Edit: Should have read further. It looks like Linus got git self-hosting in two days.

https://lkml.org/lkml/2005/4/8/9

vidarh · on Sept 4, 2012

I think it's a bit of both, and varies significantly with the domain.

Quick, how do you write an SMTP server, and what are the challenges of making it scale?

Most developers won't know how, to start with. That's fine - that's besides the issue. So they need to look it up.

Here starts the performance gap, even if you deal with people with the same lack of knowledge of the relevant RFC's.

In my experience, there's a vast difference in developers ability to read even relatively simple specs and ensure they develop something that follows it. I mentioned SMTP because it genuinely is a simple standard compared to many of the alternatives. But it has enough edge cases that you'll have a big gap out of the gate instantly between the people who have a problem mentally picturing what the spec is describing and those that can easily and systematically map it out.

Secondly in this case you'd start to see experience gaps. Even assuming most people won't have written an SMTP server, you will start seeing a gap between a group of developers that at least have in-depth knowledge of part of the domain or type of service. That will account for a very substantial difference.

In this case, understanding on how to write efficient, scalable network services makes the difference between the guy that will do horribly inefficient stuff like read()'ing a byte at a time to get a line from the client (I mention this because the MySQL C client libraries did that for years instead of the vastly more efficient solution of non-blocking larger reads into a temporary buffer to avoid the context-switches, so it's not like this is something that only rent-a-coder's with no experience will do).

Just the gap between those that understand the tradeoffs of context switches and threads vs. processes vs. multiplexing connections will account for a fairly substantial factor in many types of problems like this.

Then comes the thorny issue of queues. Most otherwise relatively competent developers will struggle to get this right in a way that is not either slow or full of race conditions. Most competent developers never have to deal with really optimizing disk-IO. Witness the wildly different approaches and performance in established mail servers to see that doing queueing well is hard, and those are the good ones.

That does not mean they won't be able to figure out how to do it well enough for typical use cases.

(I used this example, because I've written several SMTP servers, and managed teams working on mail software, so it's an area where I know the tradeoffs and problem points particularly well)

Then again, when writing your typical cookie-cutter web app, the difference probably won't be 10x because so much more of the time will be spent mediating stakeholder requests vs. solving hard problems.

olavk · on Sept 4, 2012

The origin of this 10x meme was (if I remember correctly) measuring productivity on a single task. It is not inconcievable that the difference between the worst and the best is 10x on a single task, especially for students. This doesn't tell us what the difference between the average and the worst/best, which would be more interesting. Also, it doesn't tell us if the best developer is consistenty 10x faster, or the worst performing developer just made a mistake with this particular task.

If you compare a developer solving his first task in an unfamiliar language/platform/framework, you will easily see this magnitude of difference compared to a developer with deep experience. But this difference will not stay consistent.

randomdata · on Sept 4, 2012

I don't know, there may be some truth to it. Based on the information I can gather, it seems Bellard's LTE implementation (linked yesterday) was completed over the course of about a year in his spare time. I don't know many programmers that can keep that kind of pace.

If it encompasses more than just the time of writing code, it becomes even more believable. e.g. A great programmer will take a day to implement the given feature and it will be relatively bug free. An average programmer will take a day to implement the given feature and then it will take another nine to work out the bugs introduced.