Which in turn, reminds of this paper [1] (from someone who previously worked at Facebook).
TLDR; Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed.
TLDR; Metastable failures occur in open systems with an uncontrolled source of load where a trigger causes the system to enter a bad state that persists even when the trigger is removed.
[1] Metastable Failures in Distributed Systems - https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s...