System Design

Collaborative Editing: How Google Docs Handles Concurrency

5 min read

Feb 23

In our previous session, we deconstructed Redlock, a tool for strict mutual exclusion where only one person can hold the "pen" at a time. But what happens when you want everyone to hold the pen simultaneously?

When you see a coworker's cursor dancing across a Google Doc, you are witnessing a miracle of distributed systems. Handling 50 people typing in the same paragraph without "last-write-wins" overwriting or document corruption is a complex problem of Concurrency Control.

Today, we deconstruct the two primary architectures for real-time collaboration: Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDTs).

1. The Challenge: Convergence and Intention

In a collaborative editor, we have three core requirements:

Causality: If I ask a question and you answer it, the answer must appear after the question for everyone.
Convergence: Once everyone stops typing and all messages are delivered, everyone must see the exact same document.
Intention Preservation: If I bold a word and you delete the paragraph above it, my word should still be bolded in its new position.

Traditional locking (like Redlock) fails here because "locking" a document while someone types would introduce multi-second latencies, making real-time collaboration impossible. We need Optimistic Concurrency Control.

2. Operational Transformation (OT): The Google Docs Way

Google Docs (and its predecessor, Etherpad) uses Operational Transformation. The fundamental idea is that the meaning of an operation depends on the operations that happened before it.

The Classic Conflict

Imagine a document containing the string: "LIME".

User A wants to insert 'S' at index 4 (to make it "LIMES").
User B wants to delete 'L' at index 0 (to make it "IME").

If User A's operation Opa(Insert, 4) is applied after User B's operation Opa(Delete, 0), the 'S' will be inserted at the wrong place because the indices have shifted. The result might be "IMES " or an error.

The Transformation Function

OT solves this by passing operations through a Transformation Function T.

When the server receives Opa and Opa concurrently, it doesn't just execute them. It transforms them against each other:

In our example, the server realizes that because index 0 was deleted, User A’s insertion at index 4 must be shifted to index 3.

Pros of OT:

Bandwidth Efficient: You only send small operation packets (insert/delete).
Mature: Powering Google Docs for over a decade.

Cons of OT:

Complex Logic: Writing transformation functions for every possible pair of operations (bolding, styling, images, comments) is an O(N^2) engineering nightmare.
Centralized: Requires a central server to decide the "canonical" order of operations.

3. The Jupiter Architecture

Google Docs uses a specific implementation of OT called Jupiter.

The Client: Performs the operation locally immediately (Optimistic UI) and sends the operation to the server with a "revision number."
The Server: Maintains a history buffer. If the client’s revision number is old, the server transforms the incoming operation against all operations that happened in the interim.
The Broadcast: The server sends the transformed operation to all other connected clients.

4. The Modern Challenger: CRDTs

While Google Docs uses OT, newer tools like Figma, Apple Notes, and Automerge use Conflict-free Replicated Data Types (CRDTs).

CRDTs change the data structure itself so that conflicts are mathematically impossible. Instead of using "indices" (which change), every character in a CRDT document has a Unique Immutable ID.

How it works:

If I insert 'A' between two characters, that 'A' gets a unique ID (e.g., a UUID paired with a logical timestamp). No matter how many deletions happen elsewhere, that 'A' will always know it belongs between those specific two IDs.

The "Merge" Property:

CRDTs are mathematically designed to be Commutative (A + B = B + A) and Idempotent (A + A = A). This means it doesn't matter what order the edits arrive in; once all nodes have received all edits, they will automatically converge to the same state without a central server.

Pros of CRDTs:

Decentralized: Perfect for Peer-to-Peer (P2P) editing or local-first software.
Offline Support: You can go offline for a week, make 1,000 edits, and merge them seamlessly when you reconnect.

Cons of CRDTs:

Memory Overhead: Storing a unique ID and metadata for every single character can make document files huge if not optimized.

5. Performance and Latency Compensation

To make the UI feel snappy, both OT and CRDT systems use Local Echoing.

When you type, your screen updates instantly. The "System Design" challenge is handling the Undo Buffer. If you type "Hello," then "Undo," but a remote "Delete" operation arrived from a coworker in the meantime, the "Undo" must be transformed to ensure it doesn't undo the wrong thing.

Summary: Choosing your Architecture

Feature	Operational Transformation (OT)	CRDTs
Logic Location	Mostly Server-side	Client-side (Data-centric)
Architecture	Centralized (Client-Server)	Decentralized (P2P / Local-first)
Complexity	High (Hard to write T functions)	High (Hard to optimize memory)
Best For	Web-based collaborative suites	Design tools, Offline-first apps

References & Further Reading

The Jupiter Paper (Google) - The original paper describing the synchronization protocol used by Google Docs.
Figma: How Figma’s Multiplayer Technology Works - A brilliant look at how Figma used CRDT concepts to build their design tool.
Joseph Gentle: Why OT is Hard - A deep dive into the mathematical edge cases of transformation functions.
Automerge & Yjs - The two leading open-source libraries for implementing CRDTs in modern web apps.
Local-first Software: You Own Your Data - An influential essay from Ink & Switch on why CRDTs are the future of software.

Newsletter

Level Up Your Tech Knowledge

Join 5,000+ developers receiving expert insights, coding tips, and exclusive content delivered straight to your inbox.

No spam, ever. Unsubscribe at any time.

Comments0

Leave a thought

No comments yet.
Be the first to share your thoughts!

Explore related posts

Chaos Engineering: How to Build Systems That Embrace Failure

Don't wait for a crash. How to use tools like Chaos Monkey to break your system intentionally and build resilience.

Mar 27 min read

SYSTEM DESIGN

Serving AI Agents: Scalable LLM Inference Architecture

Moving beyond chatbots. How to architect systems that run autonomous AI agents using vector databases and RAG.

Feb 275 min read

SYSTEM DESIGN

Distributed Locking: Preventing Race Conditions with Redlock

When multiple services try to write to the same resource. How to implement distributed mutual exclusion using Redis and Redlock.

Feb 205 min read

SYSTEM DESIGN

System Design

Collaborative Editing: How Google Docs Handles Concurrency

Written byTanyaradzwa

5 min read

Feb 23

Today, we deconstruct the two primary architectures for real-time collaboration: Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDTs).

1. The Challenge: Convergence and Intention

In a collaborative editor, we have three core requirements:

Causality: If I ask a question and you answer it, the answer must appear after the question for everyone.
Convergence: Once everyone stops typing and all messages are delivered, everyone must see the exact same document.
Intention Preservation: If I bold a word and you delete the paragraph above it, my word should still be bolded in its new position.

2. Operational Transformation (OT): The Google Docs Way

Google Docs (and its predecessor, Etherpad) uses Operational Transformation. The fundamental idea is that the meaning of an operation depends on the operations that happened before it.

The Classic Conflict

Imagine a document containing the string: "LIME".

User A wants to insert 'S' at index 4 (to make it "LIMES").
User B wants to delete 'L' at index 0 (to make it "IME").

The Transformation Function

OT solves this by passing operations through a Transformation Function T.

When the server receives Opa and Opa concurrently, it doesn't just execute them. It transforms them against each other:

In our example, the server realizes that because index 0 was deleted, User A’s insertion at index 4 must be shifted to index 3.

Pros of OT:

Bandwidth Efficient: You only send small operation packets (insert/delete).
Mature: Powering Google Docs for over a decade.

Cons of OT:

Complex Logic: Writing transformation functions for every possible pair of operations (bolding, styling, images, comments) is an O(N^2) engineering nightmare.
Centralized: Requires a central server to decide the "canonical" order of operations.

3. The Jupiter Architecture

Google Docs uses a specific implementation of OT called Jupiter.

The Client: Performs the operation locally immediately (Optimistic UI) and sends the operation to the server with a "revision number."
The Server: Maintains a history buffer. If the client’s revision number is old, the server transforms the incoming operation against all operations that happened in the interim.
The Broadcast: The server sends the transformed operation to all other connected clients.

4. The Modern Challenger: CRDTs

While Google Docs uses OT, newer tools like Figma, Apple Notes, and Automerge use Conflict-free Replicated Data Types (CRDTs).

CRDTs change the data structure itself so that conflicts are mathematically impossible. Instead of using "indices" (which change), every character in a CRDT document has a Unique Immutable ID.

How it works:

The "Merge" Property:

Pros of CRDTs:

Decentralized: Perfect for Peer-to-Peer (P2P) editing or local-first software.
Offline Support: You can go offline for a week, make 1,000 edits, and merge them seamlessly when you reconnect.

Cons of CRDTs:

Memory Overhead: Storing a unique ID and metadata for every single character can make document files huge if not optimized.

5. Performance and Latency Compensation

To make the UI feel snappy, both OT and CRDT systems use Local Echoing.

Summary: Choosing your Architecture

Feature	Operational Transformation (OT)	CRDTs
Logic Location	Mostly Server-side	Client-side (Data-centric)
Architecture	Centralized (Client-Server)	Decentralized (P2P / Local-first)
Complexity	High (Hard to write T functions)	High (Hard to optimize memory)
Best For	Web-based collaborative suites	Design tools, Offline-first apps

References & Further Reading

The Jupiter Paper (Google) - The original paper describing the synchronization protocol used by Google Docs.
Figma: How Figma’s Multiplayer Technology Works - A brilliant look at how Figma used CRDT concepts to build their design tool.
Joseph Gentle: Why OT is Hard - A deep dive into the mathematical edge cases of transformation functions.
Automerge & Yjs - The two leading open-source libraries for implementing CRDTs in modern web apps.
Local-first Software: You Own Your Data - An influential essay from Ink & Switch on why CRDTs are the future of software.

Newsletter

Level Up Your Tech Knowledge

Join 5,000+ developers receiving expert insights, coding tips, and exclusive content delivered straight to your inbox.

No spam, ever. Unsubscribe at any time.

Comments0

Leave a thought

No comments yet.
Be the first to share your thoughts!

Explore related posts

Chaos Engineering: How to Build Systems That Embrace Failure

Don't wait for a crash. How to use tools like Chaos Monkey to break your system intentionally and build resilience.

Mar 27 min read

SYSTEM DESIGN

Serving AI Agents: Scalable LLM Inference Architecture

Moving beyond chatbots. How to architect systems that run autonomous AI agents using vector databases and RAG.

Feb 275 min read

SYSTEM DESIGN

Distributed Locking: Preventing Race Conditions with Redlock

When multiple services try to write to the same resource. How to implement distributed mutual exclusion using Redis and Redlock.

Feb 205 min read

SYSTEM DESIGN