Let me mostly summarize Conal’s “simple” reverse AD: he drops the tape into the stack. This has two impacts: the stack is “about twice as long”; and, you “may (will?) lose tail recursion”.
It is still a really good paper—and no doubt a lot of the subtlety was list to me.
It is still a really good paper—and no doubt a lot of the subtlety was list to me.