How a Text Diff Tool Works: The Longest Common Subsequence Algorithm

Last updated: April 2026 6 min read By Ashley Connors

Quick Answer

Text diff tools find the longest sequence of lines that appear in both texts
That sequence = unchanged lines; everything else = added or removed
The core algorithm is called Longest Common Subsequence (LCS)
Plain-English walkthrough with a small worked example

What LCS actually does
Why LCS specifically
Walkthrough with a small example
Why diff output sometimes looks weird
Performance and limits
Frequently Asked Questions

Every text diff tool — including the free tool — works by finding the Longest Common Subsequence (LCS) between two texts. The algorithm identifies every line that appears in both the original and the revised version in the same order, marks those as unchanged, and marks everything else as added or removed. Here is a plain-English explanation of how it works and why diff output sometimes looks unexpected.

The Core Idea: Longest Common Subsequence

Given two sequences — for our purposes, two lists of lines — LCS finds the longest ordered list of items that appears in both sequences.

Example with simple sequences:

Sequence A: [a, b, c, d, e]
Sequence B: [a, x, c, y, e]
LCS: [a, c, e] (appears in both, in order)

For text diff, the "items" are lines. LCS identifies the lines that appear in both texts in the same relative order. Those are the "unchanged" lines. Lines only in the original become "removed." Lines only in the revised become "added."

Why LCS Is the Right Algorithm for Diff

Other algorithms could answer "what changed?" differently. LCS is the standard because it minimizes the total number of changes shown:

It does not arbitrarily mark lines as changed when they could be matched.
It respects line order — a line that moved from position 3 to position 10 shows as removed from position 3 and added at position 10, not as "unchanged."
It produces the smallest diff that explains the transformation from original to revised.

This minimizes output size, which is why LCS diff is the algorithm behind GNU diff, git diff, and most text comparison tools.

A Worked Example with Actual Text

Original text:

1. Dear team,
2. The meeting is at 3pm.
3. Please bring your laptops.
4. Regards,
5. Alex

Revised text:

1. Dear team,
2. The meeting is at 4pm.
3. Please bring your laptops.
4. And your notebooks.
5. Regards,
6. Alex

LCS of the two: [Dear team, Please bring your laptops, Regards, Alex] — 4 lines that appear in both in order.

Diff output:

Line 1: "Dear team," (unchanged)
Line 2 removed: "The meeting is at 3pm."
Line 2 added: "The meeting is at 4pm."
Line 3: "Please bring your laptops." (unchanged)
Line 4 added: "And your notebooks."
Line 5/4: "Regards," (unchanged)
Line 6/5: "Alex" (unchanged)

When Diff Output Surprises You

Occasionally diff output looks unexpected — lines marked as changed when they seem identical. Common causes:

Trailing whitespace. Line A ends with a space; line B does not. They are different strings to LCS.
Line ending differences. Windows CRLF vs Unix LF. LCS sees these as different characters.
Invisible characters. Zero-width spaces, non-breaking spaces, BOM characters. Look identical but are different bytes.
Mixed tabs and spaces. Indentation with tab vs indentation with 4 spaces — different character sequences even though visually identical.

When diff output looks wrong, the most common culprit is invisible character differences in otherwise-identical-looking lines.

How Fast Is LCS on Large Texts?

LCS has quadratic complexity — O(n*m) where n and m are the lengths of the two sequences. For line-level diff with a few hundred lines, this is instant. For very long texts with many thousands of lines, you may notice a brief pause.

Practical limits for browser-based diff:

Under 1,000 lines: instant (milliseconds).
1,000-10,000 lines: fast (under a second).
10,000-50,000 lines: noticeable but acceptable (1-5 seconds).
50,000+ lines: consider splitting the comparison or using a desktop tool.

Modern browsers handle typical document sizes without problems. Only extreme cases (entire codebases, massive log files) run into performance limits.

See LCS in Action

Paste two texts, compare. Watch the LCS algorithm find what stayed the same and what changed.

Open Free Text Diff Tool

Frequently Asked Questions

What algorithm does diff use?

Most diff tools, including the browser-based text diff, use the Longest Common Subsequence (LCS) algorithm. GNU diff, git diff, and professional comparison tools all rely on LCS or variants.

Why does LCS minimize the diff?

By finding the longest common subsequence, LCS identifies the maximum amount of unchanged content — which minimizes what has to be marked as added or removed. This produces the smallest accurate diff.

Can LCS handle very large texts?

LCS has quadratic complexity, so extremely large texts (tens of thousands of lines) can slow down. Modern browsers handle up to roughly 50,000 lines without problems.

Is there a faster diff algorithm than LCS?

Specialized algorithms like Myers diff (used by git) optimize LCS for typical real-world cases, producing results faster. They still rely on the LCS concept at their core.

Ashley Connors Content Strategy & Writing Writer

Ashley has been a freelance copywriter and content strategist for eight years across e-commerce, SaaS, and media.

How a Text Diff Tool Works: The Longest Common Subsequence Algorithm

Table of Contents

The Core Idea: Longest Common Subsequence

Why LCS Is the Right Algorithm for Diff

A Worked Example with Actual Text

When Diff Output Surprises You

How Fast Is LCS on Large Texts?

See LCS in Action

Frequently Asked Questions

What algorithm does diff use?

Why does LCS minimize the diff?

Can LCS handle very large texts?

Is there a faster diff algorithm than LCS?

Related Posts

Free Text Diff Tool

Diff Ignoring Whitespace

Text Similarity Percentage

How Text Diff Works