Continuations Under the Covers [video]

scanr · on Sept 29, 2023

This is pretty neat.

Somewhat related: I’ve been intrigued by serialisable continuations forever. The idea of machine and time independent control flow in a regular programming language could be quite useful eg:

    response = send_email_and_wait_for_response(“hi!”, timeout=“4d”)

samsquire · on Sept 29, 2023

Me too!

Imagine an extremely robust job queue submission but also let you move computation through a network. Like Capnproto promise pipelines, you can resume computation on another computer, so rather than paying for round trips, you just move the computation through the network. You ask another computer to do something, and it handles the processing of the result on that machine.

I find Temporal [0] interesting for this reason and recently there was a HN post about Telescript.

I am designing a syntax for async pipelines that resembles a state machine that ideally would handle events occurring on different machines too but I'm specifically targeting multithreaded events with the goal of microservices. [1]

0: https://temporal.io/

1: https://github.com/samsquire/dream-programming-language#intr...

BenoitP · on Sept 29, 2023

Time independence begs the question of failure recovery (and the associated at least once and at most once recovery basic options).

This kind of problem is more suited to solutions like 'closure-persisted lambda functions'. Flink's Statefun comes to mind [1], and also Restate [2]. These come with their host of problems, though. For example I hope you have a strategy for distributedly tracing these, as debugging an isolated continuation will have to be done.. in isolation.

Also comes the general question of lifetimes. What's GC when your functions are spread over a few kB in 100k different places in RAM, SSD, tape? Or do you go à la Rust, trying to come up with guarantees of them (and recover at least once these?)

I believe this is the future, though. Code is often way lighter than data, so why not move it around closer to data? Say a webserver routes a query, then makes a call to the DB, filters some data, and sends it back to the client. Why does the code not live in the NIC for routing, the filtering live in the SSD controller on another server, and the response is transferred from device buffers with direct copying?

Of course Loom's continuation can have a play in this, maybe be the underlying efficient serde mechanism?

There are initiatives to freeze whole JVMs, and bring them back in RAM when there's an HTTP call to be answered for example [3]. Maybe the path of performance for these is in only thawing the continuations and their dependencies, and blocking on >1h delays means getting frozen to the SSD.

Maybe in the future AWS is going to bill webserver lambda usage with SSD IOPS, RAM-L3-L2-L1 occupancy, NIC buffer usage granularity. Like Feynman said, there's plenty of (design) space at the bottom.

[1] https://nightlies.apache.org/flink/flink-statefun-docs-stabl...

[2] https://www.restate.dev/

[3] https://assets.ctfassets.net/oxjq45e8ilak/3KKci3H0yZWIlzHBjC...

touisteur · on Sept 29, 2023

Low-level, DMA on PCIe stuff like GPUdirect and StorageDirect is a (limited, low-level, frustrating) form of disaggregated computing, if you think of DMA engines as tiny specific cores (and FPGA people have been using the PCIe bus and crazy 'DMAs' for this kind of stuff for a long time now).

On the high end you got your GPUs and accelerators, and recently all kinds of combinations of NIC, CPUs and GPUs, and on the extreme of the spectrum some of them also have ultra high bandwidth interconnect (mostly nvlink) that make the whole concept of disaggregated computing (in one host) more than viable and exciting.

It's just a big pain right now to program, synchronize, schedule all these async units and their async transfers, all that in a somewhat portable way.

kitd · on Sept 29, 2023

> There are initiatives to freeze whole JVMs, and bring them back in RAM when there's an HTTP call to be answered for example [3].

There used to be a research JVM from a group called Velare that did exactly this. It was very cool. It was almost like watching 2 programs have a conversation with each other, with the back and forth of information passing.

pron · on Sept 29, 2023

Also, imagine a thread does the following:

    var result = stmt.executeQuery("select from ...");

When the thread blocks on the DB, rather than bring data from a remote machine, the runtime could serialize the thread, send its state over the wire, and unblock it on the machine containing the data.

BenoitP · on Sept 29, 2023

Very interesting that this comment comes from you (pron is the guy giving the talk).

Is this something you know someone is working on? Maybe at Oracle? Or just a general thought? I follow the loom-dev mailing-list and haven't seen this topic come up.

SSD manufacturers have controllers that checksum the data and manage the blocks. One could even imagine going one step further by having the continuation being sent to the controller and filters and projections being done right there; saving also the SATA bandwidth. Some manufacturers are starting to get there [1]

What about GC in this context, though? Getting over the wire means execution is not guaranteed, the DB server can crash. Continuations stacks' content is scanned for references. Does that mean that the old copy of the continuation is kept there for the GC? And there's a timeout to collect the continuation if it goes silent? What if the remote continuation want to access something in the previous JVM? Should the continuation be scanned before being sent? Can that be a compile-time guarantee? What about exceptions and stacktraces?

[1] https://www.arm.com/blogs/blueprint/computational-storage

scanr · on Sept 29, 2023

Yeah! Back in the day, the Rhino implementation of JavaScript had serialisable continuations on top of the JDK. I wrote a workflow engine with them.

There are some interesting challenges:

- deserialising them into the same environment

- hooking in dependencies

- for time independence, how do you fix a bug on a process that has been running for 2 weeks and has got 2 more to go

Azure stateful functions solved this by instead of serialising the stack, they just start the process from scratch and then replay every external request / response back into them until they get the process back to its last instruction.