WebAssembly and Replayable Functions

kodablah · on July 7, 2023

At Temporal we've done a lot of research on WASM-based deterministic workflows. While they make sense in a general way, they suffer a similar problem as deterministic OS environments (e.g. hermit) - safe updates. People often need to apply code changes and know they will be deterministic changes with regards to past executions.

When determinism is clearly defined at the language level, you can easily see when an alteration would be compatible with existing long running executions when replaying old history. However when it's at a low level, it's less clear.

For example, Go WASM includes its coroutine scheduler inside the WASM blob. A slight change in an unrelated part of the code can change the order of coroutines causing it to be an unsafe non-deterministic change when deployed for running workflows. Similarly, Go may make an external call to a seeded deterministic random for map iteration (yes, Go explicitly randomizes map iteration order). One other call to that random will make that map iteration order possibly different breaking replay.

So WASM is great unless you need to alter code for existing workflows that run for years. Then it becomes less clear, whereas language level determinism can have clear semantics to code authors and static analyzers with explicit versioning/patching if needed. Of course using lower-level languages can help here too since they are more predicably compiled with a smaller runtime, but they still require authors to be careful on code changes.

aatd86 · on July 7, 2023

Interesting! I'm not very knowledgeable about this topic so I'm wondering why workflows would rely on code that doesn't remove the sources of non-determinism? (runtime errors, lack of proper serialized synchronization logic etc...)?

Or is the issue that the evolution of a language might introduce additional sources of non-determinism?

So merely upgrading without checking for these wouldn't be safe if code execution depends on the behavior of a specific runtime?

(that would probably for code that never fails at runtime I assume which also seem to be a pretty stringent requirement?)

The advantage of a virtual machine would then be that the full execution trace of the instructions can be recorded and then replayed, regardless of the higher level changing semantics?)

kodablah · on July 7, 2023

> why workflows would rely on code that doesn't remove the sources of non-determinism?

We can't always sandbox every environment to remove these sources at a language level. Sure in JS/TypeScript we can use V8, but in Go there aren't many good ways to do this (all options, e.g. interpreters, have tradeoffs). So we offer deterministic alternatives for non-deterministic language features/stdlib and rely on authors and static analysis for catching misuse.

> Or is the issue that the evolution of a language might introduce additional sources of non-determinism?

This is less concerning, but still can be a concern.

> So merely upgrading without checking for these wouldn't be safe if code execution depends on the behavior of a specific runtime?

Technically true, though the bigger concern is updating code you think wouldn't cause a change in the order of upstream operations (what we call "commands") but does.

evacchi · on July 7, 2023

worth mentioning: https://github.com/stealthrocket/timecraft

full disclosure: I don't work on it, but the devs are committers/contributors to https://wazero.io (I am a wazero committer) :)