My understanding of the Asiana crash was that the autopilot would have landed the plane fine, and that it was the humans turning it off that caused the problem.
Your point is still valid, but perhaps we approach a time when over-reliance is better than all but the best human pilots (Sully, perhaps).
The Asiana pilots were not able to fly a coupled (automatic) landing due to the ILS glideslope being out of service.
The pilots were under the misguided impression that the aircraft would automatically spool-up the engines if the aircraft became to slow. This was a safety feature that didn't engage for a obscure technical reason. Even with a manual visual approach the pilot can still use the autothrust for landing.
A more rigorously trained pilot (eg. Capt. Sully) would have aborted the approach and performed an immediate go-around if he got below the glidepath (or too slow) below a certain altitude (eg. 400ft Above Ground Level).
The rules requiring a go-around (or missed approach) apply for a fully automated approach and landing, just as much as manually flown approach and landing.
The Air France 447 accident is a better fitting example of pitfalls that may obtain in complex "humans-with-automation" types of systems.
There, automation lowered both the standard for situational awareness and fundamental stick and rudder skills. Then, when a quirky corner case happened, the pilots did all manner of wrong on the problem: so much so, they amplified a condition from "mostly harmless" to fatal for all.
Vanity Fair has a nice piece on this accident that's easy to dig up. Good read.
I heard it was the Airbus weirdness of steering setup that noticeably added to the problems (Separate, disjointed joysticks)
One pilot pulled up as hard as he could while the other one thought he was pushing down, making the confusion this much worse
That's true, but was well known (and trained on), so I'd categorize that domain as "How the machine responds when you're hands are on the controls," which is nearly a synonym for "stick and rudder skills" category I cited.
Sure, to nearly every pilot that behavior is wacky, but it shouldn't have been a surprise for more than an instant to pilots who were "operating as designed."
It seems there's no free lunch: when skills atrophy as a natural response to helpful automation it requires advancement in some other skills, should the goal of an ever improving error (accident) rate be achieved.
What's interesting is the number of people posting here who found the questions ambiguous. One assumes that your average HN poster may be a little overly detail oriented (and privileged), but even so... what this seems to be testing as much as algebra skill is correctly parsing the question (assuming there is a single 'correct' way to do that). Shouldn't that skill be part of the reading comprehension testing? Ask any customer facing developer and they will tell you there is no single interpretation for most customer requirements.
My other concern was the answer to #1, namely, "4m + 5b" vs "5m + 4b". I feel like if you know it is in that form (sum rather than product) the two answer choices are just intended to trip people up who are moving through the test at speed.
That's very true. I wonder if the goal of the test writers was simply to make the problems wordier, or if they deliberately created ambiguity to test parsing skills? I don't think the article was clear on that distinction.
Problems that are intentionally difficult to parse fail as tests of "real world" problem solving skills because in the real world it is almost always possible to seek clarification.
Felt this was glossed over. It's all well and good to not be too harsh for the reasons they mentioned, but ultimately the point is to 'peer review' the science.
In fact I would say an important "mistake reviewers make" is ... not actually doing much work. I've seen some appalling 2-3 line comments like "seems fine", even from senior academics. That's not even to talk about problems with misunderstood p-values, not reading the algorithm closely, not walking through the proof manually, and so on.
This is interesting work. I'm curious how confident we can be that the TLA+ proof from Diego Ongaro was correctly represented in Verdi/Coq. This still seems like a manual, hard-to-verify process.
Hey, I'm James Wilcox, another member of the Verdi team.
This is a good question. More broadly, what do you have to trust in order to believe our claims?
In addition to all the usual things (like the soundness of Coq's logic and the correctness of its proof checker), the most important thing you need to trust is the top-level specification. In our case, this is linearizability (have a look at https://github.com/uwplse/verdi/blob/master/raft/Linearizabi... for the key definition; the key theorem statement is at https://github.com/uwplse/verdi/blob/master/raft-proofs/EndT... ). If you can convince yourself that these correspond to your intuitive understanding of linearizability, then you don't need to trust any other theorem statement or definition in the development.
If you actually want to run our code, then you need to make several other assumptions. Our system runs via extraction to OCaml, so you must trust the extractor and the OCaml compiler and runtime. In addition we have a few hundred lines of OCaml to hook up the Coq code to the real world (eg, writing to disk and putting bits on the network).
To respond more directly to your question about Diego's proof, I can tell you we referred to it often to get the high-level idea. But the TLA model differs in several respects from our implementation of Raft in Verdi. Most importantly, our code is an implementation in the sense that you can run it. This means that it resolves all nondeterminism in the specification. Furthermore, there is no need to manually check that what we implemented matches the TLA model, unless that is your preferred means for convincing yourself that we really did implement Raft.
I like the tool. As for the extensions, I would suggest that in my experience SonarQube has the same ability for extension, and does most of its work in the open. So it might be worth considering turning your project into a plugin (or plugins) for SonarQube?
(author here)
We cross-tabbed system age vs perception of amount of TD. There was a moderate association between older systems (> 6 yrs) and more perceived debt.
I did not explicitly look at size as this was not one of the original research questions, but a good point. I suspect older systems will tend to be larger (in the domains we studied, anyway). And your point about arch choices being great "early on" can, I think, be captured in the "system age" variable. I guess I'm trying to think of a system that might be young and yet quite large in LOC.. would be an interesting outlier to look at.