I didn't explain this well but the transformation is lossless. No data is lost f...

jg23 · on July 31, 2021

I have a related question to this, if you’re storing [“hello”] as one chunk, what happens when you perform an edit to say adding an extra [“e”] after the [“e”]? In the unoptimised structure I know you can just add the new [“e”] as a child of the original [“e”]. So here would you then delete the chunk [“hello”] and split it into two halves like [“he”] and [“llo”]?

josephg · on July 31, 2021

Yes exactly. You replace the chunk with “he” and “llo” then insert the extra item in the middle. The result is [“he”, “e”, “llo”]. The code to do all this inline in a b-tree is pretty hairy but it works pretty well!

Majromax · on July 31, 2021

Ah, I see. I had thought that the consolidation gave a batched update with a single ID, so 'h' + 'ell' + 'o' would have IDs of 1, 2, and 3 respectively. That would have made an editing conflict in the middle of 'ell' impossible.

josephg · on July 31, 2021

Ah that makes sense! Concurrent changes are one thing, but concurrent changes are rare. The big problem if we did it that way is that it would become impossible to insert in the middle of "ell", because we wouldn't be able to name any of those internal positions.

To get around that we assign a sequential ID for every inserted character, regardless of how they're typed or stored. Typing "hello" would assign IDs 1-5 even if you paste "hello" from the clipboard.