In our previous session, we deconstructed Redlock, a tool for strict mutual exclusion where only one person can hold the "pen" at a time. But what happens when you want everyone to hold the pen simultaneously?
When you see a coworker's cursor dancing across a Google Doc, you are witnessing a miracle of distributed systems. Handling 50 people typing in the same paragraph without "last-write-wins" overwriting or document corruption is a complex problem of Concurrency Control.
Today, we deconstruct the two primary architectures for real-time collaboration: Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDTs).
1. The Challenge: Convergence and Intention
In a collaborative editor, we have three core requirements:
Causality: If I ask a question and you answer it, the answer must appear after the question for everyone.
Convergence: Once everyone stops typing and all messages are delivered, everyone must see the exact same document.
Intention Preservation: If I bold a word and you delete the paragraph above it, my word should still be bolded in its new position.
Traditional locking (like Redlock) fails here because "locking" a document while someone types would introduce multi-second latencies, making real-time collaboration impossible. We need Optimistic Concurrency Control.
2. Operational Transformation (OT): The Google Docs Way
Google Docs (and its predecessor, Etherpad) uses Operational Transformation. The fundamental idea is that the meaning of an operation depends on the operations that happened before it.
The Classic Conflict
Imagine a document containing the string: "LIME".
User A wants to insert 'S' at index 4 (to make it
"LIMES").User B wants to delete 'L' at index 0 (to make it
"IME").
If User A's operation Opa(Insert, 4) is applied after User B's operation Opa(Delete, 0), the 'S' will be inserted at the wrong place because the indices have shifted. The result might be "IMES " or an error.
The Transformation Function
OT solves this by passing operations through a Transformation Function T.
When the server receives Opa and Opa concurrently, it doesn't just execute them. It transforms them against each other:
In our example, the server realizes that because index 0 was deleted, User A’s insertion at index 4 must be shifted to index 3.
Pros of OT:
Bandwidth Efficient: You only send small operation packets (insert/delete).
Mature: Powering Google Docs for over a decade.
Cons of OT:
Complex Logic: Writing transformation functions for every possible pair of operations (bolding, styling, images, comments) is an O(N^2) engineering nightmare.
Centralized: Requires a central server to decide the "canonical" order of operations.
3. The Jupiter Architecture
Google Docs uses a specific implementation of OT called Jupiter.
The Client: Performs the operation locally immediately (Optimistic UI) and sends the operation to the server with a "revision number."
The Server: Maintains a history buffer. If the client’s revision number is old, the server transforms the incoming operation against all operations that happened in the interim.
The Broadcast: The server sends the transformed operation to all other connected clients.
4. The Modern Challenger: CRDTs
While Google Docs uses OT, newer tools like Figma, Apple Notes, and Automerge use Conflict-free Replicated Data Types (CRDTs).
CRDTs change the data structure itself so that conflicts are mathematically impossible. Instead of using "indices" (which change), every character in a CRDT document has a Unique Immutable ID.
How it works:
If I insert 'A' between two characters, that 'A' gets a unique ID (e.g., a UUID paired with a logical timestamp). No matter how many deletions happen elsewhere, that 'A' will always know it belongs between those specific two IDs.
The "Merge" Property:
CRDTs are mathematically designed to be Commutative (A + B = B + A) and Idempotent (A + A = A). This means it doesn't matter what order the edits arrive in; once all nodes have received all edits, they will automatically converge to the same state without a central server.
Pros of CRDTs:
Decentralized: Perfect for Peer-to-Peer (P2P) editing or local-first software.
Offline Support: You can go offline for a week, make 1,000 edits, and merge them seamlessly when you reconnect.
Cons of CRDTs:
Memory Overhead: Storing a unique ID and metadata for every single character can make document files huge if not optimized.
5. Performance and Latency Compensation
To make the UI feel snappy, both OT and CRDT systems use Local Echoing.
When you type, your screen updates instantly. The "System Design" challenge is handling the Undo Buffer. If you type "Hello," then "Undo," but a remote "Delete" operation arrived from a coworker in the meantime, the "Undo" must be transformed to ensure it doesn't undo the wrong thing.
Summary: Choosing your Architecture
Feature | Operational Transformation (OT) | CRDTs |
Logic Location | Mostly Server-side | Client-side (Data-centric) |
Architecture | Centralized (Client-Server) | Decentralized (P2P / Local-first) |
Complexity | High (Hard to write T functions) | High (Hard to optimize memory) |
Best For | Web-based collaborative suites | Design tools, Offline-first apps |
References & Further Reading
The Jupiter Paper (Google) - The original paper describing the synchronization protocol used by Google Docs.
Figma: How Figma’s Multiplayer Technology Works - A brilliant look at how Figma used CRDT concepts to build their design tool.
Joseph Gentle: Why OT is Hard - A deep dive into the mathematical edge cases of transformation functions.
Automerge & Yjs - The two leading open-source libraries for implementing CRDTs in modern web apps.
Local-first Software: You Own Your Data - An influential essay from Ink & Switch on why CRDTs are the future of software.