How a Text Diff Tool Works: The Longest Common Subsequence Algorithm
- Text diff tools find the longest sequence of lines that appear in both texts
- That sequence = unchanged lines; everything else = added or removed
- The core algorithm is called Longest Common Subsequence (LCS)
- Plain-English walkthrough with a small worked example
Table of Contents
Every text diff tool — including the free tool — works by finding the Longest Common Subsequence (LCS) between two texts. The algorithm identifies every line that appears in both the original and the revised version in the same order, marks those as unchanged, and marks everything else as added or removed. Here is a plain-English explanation of how it works and why diff output sometimes looks unexpected.
The Core Idea: Longest Common Subsequence
Given two sequences — for our purposes, two lists of lines — LCS finds the longest ordered list of items that appears in both sequences.
Example with simple sequences:
- Sequence A: [a, b, c, d, e]
- Sequence B: [a, x, c, y, e]
- LCS: [a, c, e] (appears in both, in order)
For text diff, the "items" are lines. LCS identifies the lines that appear in both texts in the same relative order. Those are the "unchanged" lines. Lines only in the original become "removed." Lines only in the revised become "added."
Why LCS Is the Right Algorithm for Diff
Other algorithms could answer "what changed?" differently. LCS is the standard because it minimizes the total number of changes shown:
- It does not arbitrarily mark lines as changed when they could be matched.
- It respects line order — a line that moved from position 3 to position 10 shows as removed from position 3 and added at position 10, not as "unchanged."
- It produces the smallest diff that explains the transformation from original to revised.
This minimizes output size, which is why LCS diff is the algorithm behind GNU diff, git diff, and most text comparison tools.
Sell Custom Apparel — We Handle Printing & Free ShippingA Worked Example with Actual Text
Original text:
1. Dear team, 2. The meeting is at 3pm. 3. Please bring your laptops. 4. Regards, 5. Alex
Revised text:
1. Dear team, 2. The meeting is at 4pm. 3. Please bring your laptops. 4. And your notebooks. 5. Regards, 6. Alex
LCS of the two: [Dear team, Please bring your laptops, Regards, Alex] — 4 lines that appear in both in order.
Diff output:
- Line 1: "Dear team," (unchanged)
- Line 2 removed: "The meeting is at 3pm."
- Line 2 added: "The meeting is at 4pm."
- Line 3: "Please bring your laptops." (unchanged)
- Line 4 added: "And your notebooks."
- Line 5/4: "Regards," (unchanged)
- Line 6/5: "Alex" (unchanged)
When Diff Output Surprises You
Occasionally diff output looks unexpected — lines marked as changed when they seem identical. Common causes:
- Trailing whitespace. Line A ends with a space; line B does not. They are different strings to LCS.
- Line ending differences. Windows CRLF vs Unix LF. LCS sees these as different characters.
- Invisible characters. Zero-width spaces, non-breaking spaces, BOM characters. Look identical but are different bytes.
- Mixed tabs and spaces. Indentation with tab vs indentation with 4 spaces — different character sequences even though visually identical.
When diff output looks wrong, the most common culprit is invisible character differences in otherwise-identical-looking lines.
How Fast Is LCS on Large Texts?
LCS has quadratic complexity — O(n*m) where n and m are the lengths of the two sequences. For line-level diff with a few hundred lines, this is instant. For very long texts with many thousands of lines, you may notice a brief pause.
Practical limits for browser-based diff:
- Under 1,000 lines: instant (milliseconds).
- 1,000-10,000 lines: fast (under a second).
- 10,000-50,000 lines: noticeable but acceptable (1-5 seconds).
- 50,000+ lines: consider splitting the comparison or using a desktop tool.
Modern browsers handle typical document sizes without problems. Only extreme cases (entire codebases, massive log files) run into performance limits.
See LCS in Action
Paste two texts, compare. Watch the LCS algorithm find what stayed the same and what changed.
Open Free Text Diff ToolFrequently Asked Questions
What algorithm does diff use?
Most diff tools, including the browser-based text diff, use the Longest Common Subsequence (LCS) algorithm. GNU diff, git diff, and professional comparison tools all rely on LCS or variants.
Why does LCS minimize the diff?
By finding the longest common subsequence, LCS identifies the maximum amount of unchanged content — which minimizes what has to be marked as added or removed. This produces the smallest accurate diff.
Can LCS handle very large texts?
LCS has quadratic complexity, so extremely large texts (tens of thousands of lines) can slow down. Modern browsers handle up to roughly 50,000 lines without problems.
Is there a faster diff algorithm than LCS?
Specialized algorithms like Myers diff (used by git) optimize LCS for typical real-world cases, producing results faster. They still rely on the LCS concept at their core.

