Well, dang. I remember you from the very late 1980s at Stanford. Never met you but saw you on the timesharing systems of the time. A few years later I enjoyed a talk you gave at IBM Yorktown Heights about how you sneaked into Apple to build the graphing calculator.
Well howdy, old-timer! That brings back some memories. (The '80s: after the dinosaurs, but before the giant armored sloths - back when neutrinos were massless and Λ was zero.)
Did you work on Axiom? That was the finest crafted computer algebra system.
What are the chances that type of story could be repeated today? Seems pretty unlikely, but corporations are in some ways 'dumb' and we do base a lot of our society on trust.
At Apple? None whatsoever. It was a surprising fluke that even in the Apple of 1993. It only succeeded because so many people helped. That Apple was beleaguered at the time may have given employees a certain devil-may-care attitude towards their own job security and a willing to cross certain lines to assist us or look the other way.
Applications / Utilities / Grapher is a different application (formerly Curvus Pro X from Arizona Software) which replaced the original Mac OS 7 Apple menu Graphing Calculator because I was very slow porting to Mac OS X.
I'm wondering what you needed to write unsafe code for? Could there be another way to get rid of reference counting for your use case?
Another question, have you tried using Accelerate framework to solve performance bottlenecks (or save yourself from having to write your own calc code)?
Most of the performance critical sections are numeric computation loops. I use the Unsafe APIs where profiling shows that the bounds-checking overhead is significant.
In principle, I could get rid of reference counting overhead by using value types or immutable data. I couldn't see a simple path to doing that without re-architecting everything (with no guarantee that the end result would not just have different performance issues.) For the moment, I'm awaiting compiler improvements before re-evaluating the tradeoffs. There is certainly room for the compiler to reason better on eliding retain/release. https://github.com/apple/swift/issues/58549
Yes, the code does use Accelerate where applicable. That is one component of the numeric evaluation. It addresses the lowest level of things like evaluate the sin function on every array element, or multiply there arrays element wise. Performance tuning is a game of whack-a-mole. There's always another bottleneck somewhere.
The math is internally represented as a tree for display and editing. Most of the performance critical code is the numeric evaluation when graphing. For that, the math is compiled to a linear byte code which vastly improves locality of reference and is an opportunity to apply optimizations such as common subexpression elimination and loop invariant code motion.
I hadn't followed the link in article originally to get to https://mobile.twitter.com/RonAvitzur/status/146102321572409.... So literally the process of parsing the expression and producing the byte code is the performance challenge now? Or is it also walking the bytecode to do anything?
My basic question would be: why not go back to flex/bison/yacc/whatever via C-FFI? (But I think it would still be bad, since you'll want to get to a Swift data structure for your ops and those will still have the Arc issues)
That thread describes me working through performance issues in the initial port eight months ago. Those cases perform adequately now. Parsing is not a bottleneck. Walking the bytecode remains a performance hotspot, as that is where all the numeric calculations occur, but no more or less so comparing the C++ and Swift implementations.
I did investigate maintaining the flex/bison parser, since its generated state machine C code is more robust than my handwritten recursive descent parser when presented with pathological input. However, as you say, since I need a Swift data structure in the end, there is little to be gained and a lot of complication bridging via a C-FFI.
Yes, the numeric evaluation is vectorized via the Accelerate Framework's vForce and vDSP APIs. That is a significant performance improvement. The numeric evaluation remains a hotspot with vectorization, as that is where the app does most of its work.
Was it in C++ originally not C? (It seems like a C FFI bridge would have been straightforward from). Or did it already go from C => Objective-C++ a while back?
(I read the article hours ago and didn't notice you'd posted. Btw, I think we would have gotten better conversation if you had said "Hey, this is me! AMA" as a top-level comment)
It has been in continuous development since 1985. Before the port, parts were in C, C++, Objective-C, and Objective-C++, as well as Lex, Yacc, GLSL, and Python. Will do the top-level comment.