I agree with the author that historical information about how a codebase has evolved is important. I would also argue that code comments are not always the best place for this historical information (if you don’t know about the deep past of a bit of code, then why would you want to see a code comment describing some change to it?).
I suggest we take a step back and ask if modern version control is the best way to store historical information. Modern version control systems (git, mercurial, etc.) were built within the last decade or so but they were built with the same constraints as the original version control systems of the 1980’s. They are optimized to be disk efficient (and don’t get me started about their command line interfaces). This is crazy!
We should store much more about the programming process than the data gathered if and when a developer chooses to commit. We should record it all- every keystroke. No human generated source of data is ever going to fill up our hard drives or the cloud. Don’t optimize for the disk!
This data can be used to replay programming sessions so that others can learn exactly how the code evolved. Developers could then comment on the evolution of their code. Think of this as a modern commit message. I am working on a project that attempts to do this:
I don't see it as optimizing for disk, I see it as optimizing for time, by presenting relevant events vis-à-vis showing every irrelevant detail. Much like a movie or novel doesn't usually show its characters going to the bathroom, neither should my coworkers have to sift through my misspellings, dumb decisions and irrelevant debugging.
I agree that we could benefit from saving more (e.g. relevant exploratory sessions, though those tend to happen in the REPL, not in the code editor), but I disagree with an indiscriminate approach.
Storyteller has ways to filter out your misspellings, dumb decisions, and irrelevant debugging.
While the process for filtering out dumb decisions and irrelevant debugging is a little convoluted at present, filtering out misspellings is really easy if they're made within X seconds (where X is defined by you). Because Storyteller currently only supports IDEs (specifically Eclipse, hopefully with support for Visual Studio soon) you should be able to notice those quickly.
As for processing out dumb decisions, you don't really know they're dumb until after you've made them, and someone new to a part of the project might also have the same idea you had when made those decisions, and seeing that you made them, hopefully coupled with some comments on what went wrong with those choices could push them in a different direction, or help them fill in a piece you were missing. Watching those past decisions could also help you in the future when you come back to some code and can't remember what you did before or why.
Exploratory sessions only happen in REPLs when the language has a REPL. Considering that Storyteller's written in Java, has support only for an IDE that was initially built for Java, and has been written by a bunch of college students and one professor at a college where most CS courses use C++ or Java, REPLs aren't really things most of us use (I want to change that, but there's only so much you can do through an extracurricular organization).
Plus, we're developers. Why store only some data when you can store ALL THE DATA!
Imagine if the author of the novel had to write those irrelevant scenes anyway in order to get to the good stuff. There might be someone, a literature academic for example, who would want to study that material. It wouldn't be for everyone but there are some who might want access to it. My point is that we are generating this history anyway, so why throw it out? There may be someone who wants to see how the code has evolved. Plus, nobody says you have to watch stuff that is not interesting. There are ways to filter out things you aren't interested in.
Most interesting. I recently did some work on a new, version-control-inspired layout for programs, but from a different angle than you suggest: http://akkartik.name/post/wart-layers. I'd love to chat more about the details of what you're trying (email in profile).
1. How is that different than functionizing things? And, is it better? Especially because in wart it looks like your snippets can only be used once.
2. Does this bring you any advantages that well commented code doesn't? From my admittedly limited point of view (I haven't run it, just looked at your two examples) it looks like following the flow of control is a little more difficult, because it looks like your snippets are, when compiled, just placed where their comments are. Because variables are accessible and manipulable in your snippets there isn't any containment like you get with functions.
Again, this is from me only looking at your two examples.
The biggest limitation of functions is precisely what you point out: they create scopes. So you end up complecting (http://www.infoq.com/presentations/Simple-Made-Easy) what variables you need access to at a time with what variables you want to describe and explain at a time.
I don't think it's controversial that functions have limitations. For example, OO in many ways was an attempt to work around the limitations of functions. But what OO discovered, I think, was that any sort of modularity mechanism when baked into the language brings in its own constraints, which limit the situations where it can be used. The classic example is all the constraints on C prototypes that make any sort of refactoring of include files an NP-hard problem, dooming lots of codebases to never get the reorganization they need to free them from historical baggage. So I've gradually, grudgingly started to focus on more language-independent, tool-based approaches that can overlay an 'untyped' layer atop even the most rigid language.
"Because variables are accessible and manipulable in your snippets there isn't any containment like you get with functions."
My claim (http://akkartik.name/post/readable-bad) is that in seeking local properties like containment/encapsulation we deemphasize global understanding. Both are useful, certainly, but they're often in tension and our contemporary rhetoric ignores the tension. The pendulum has swung so much in favor of local rules for 'good style' that it's worth temporarily undoing some of that work to see what we're giving up, what the benefits of playing fast and loose with local structure might be.
"..following the flow of control is a little more difficult.."
Yeah that's a valid concern. I think literate programming failed to catch on partly because we need at times to see the entire flow of control in a function. Like when we're debugging. I have a vague vision that programmers of the future will work with the expository and 'tangled' views of a program side by side. (In addition to perhaps a view of the runtime execution of a single unit test: http://akkartik.name/post/tracing-tests.)
Your point about reusing snippets is also a good one. That's the benefit of naming fragments in literate programming, isn't it? I hadn't considered that; the examples I've seen never mention it. But emacs org-mode and http://leoeditor.com certainly seem to find reuse useful. Hmm. I haven't encountered the need for reusing snippets so far. That might change, and we can probably come up with some syntax to support it if so. I suspect, however, that our languages already have plenty of primitives for enabling reuse. We don't need any extra tool or meta-linguistic support.
I think the emphasis on containment and local understanding is good, especially considering that programs are getting huge (which is a separate problem, and what I think really needs to get fixed). With huge programs it's infeasible to fully comprehend the whole program, which means the only thing you can really do is hope that other programmers' functions work as advertised, and focus on perfecting your local domain.
The easiest way to alleviate this, in my opinion, is to focus on building smaller programs which focus on doing one thing well, and combining those together to create larger applications, with preferably a minimum of glue code. In my mind this leads to even more containment as each domain is now accessible only through the specified API.
This could lead to similar problems that you have with the deemphasis of global understanding, because it's still compartmentalizing things, and at each higher level the programmer is just trusting that the lower levels have implemented what they said they would, just like in a huge, single program.
The idea of being T-shaped specifically when it comes to the overall knowledge of the projects you work on seems to be the best way to work on those applications: have a general understanding of the whole project, and a really good understanding of your specific domain (and perhaps an intermediate understanding of those around yours).
If I have to later see every dumb thing I let within ten feet of my IDE window while working, I'm going to set the computer on fire and code pen on paper for later transcription.
Ha ha... yes, it sounds painful but imagine you come across a function that you don't fully understand. You can highlight it and watch how that code evolved. The code will be animated very easily (compared to trying to reconstruct the history from some VCS). Most people will never watch (or care about) you writing code but sometimes it may be very useful.
I suggest we take a step back and ask if modern version control is the best way to store historical information. Modern version control systems (git, mercurial, etc.) were built within the last decade or so but they were built with the same constraints as the original version control systems of the 1980’s. They are optimized to be disk efficient (and don’t get me started about their command line interfaces). This is crazy!
We should store much more about the programming process than the data gathered if and when a developer chooses to commit. We should record it all- every keystroke. No human generated source of data is ever going to fill up our hard drives or the cloud. Don’t optimize for the disk!
This data can be used to replay programming sessions so that others can learn exactly how the code evolved. Developers could then comment on the evolution of their code. Think of this as a modern commit message. I am working on a project that attempts to do this:
http://www.storytellersoftware.com