How to Compare Two Texts and See Their Similarity Percentage
- Estimate text similarity by the ratio of unchanged lines to total lines
- For precise percentage scoring, specialized similarity tools exist
- Useful for spotting "mostly the same" vs "substantially rewritten"
- Worked example: calculating similarity from a diff tool output
Table of Contents
Comparing two texts and getting a similarity percentage is common in plagiarism checking, A/B copy testing, and translation review. While dedicated similarity tools compute a numeric percentage, a free text diff tool lets you estimate similarity visually — the more lines that remain unhighlighted (unchanged), the higher the similarity. For most practical purposes, this visual estimate is enough to answer "how similar are these two texts?"
How Similarity Percentages Are Calculated
Academic and professional similarity percentages use various algorithms:
- Jaccard similarity: Ratio of common words to total unique words across both texts.
- Cosine similarity: Vectorize both texts and measure the angle between the vectors (common in NLP).
- LCS-based: Ratio of the longest common subsequence to total length — used by diff tools.
- Edit distance: Count character-level edits needed to transform one text into the other.
Each produces a different numeric result for the same two texts. A "95% similar" score from one tool may be "87% similar" from another — which is why similarity percentages are best used for relative comparison within one tool, not absolute judgments across tools.
Estimating Similarity from Diff Output
A diff tool shows three types of lines: unchanged (neutral), added (green), removed (red). A rough similarity estimate:
Similarity ≈ (unchanged lines) / (total unique lines in both texts)
Example: Original text has 100 lines. Revised text has 105 lines. Diff shows 90 unchanged, 15 added, 10 removed. Total unique lines = 90 + 15 + 10 = 115. Similarity ≈ 90 / 115 ≈ 78%.
This is approximate — line-level comparison misses word-level changes within unchanged-looking lines. But for a rough gauge of "mostly the same" (>80%) vs "substantially rewritten" (<50%), it is good enough.
Sell Custom Apparel — We Handle Printing & Free ShippingWalkthrough: Estimate Text Similarity
Step 1: Paste both texts into the text diff tool.
Step 2: Click Compare. Note the output line counts.
Step 3: Count unchanged lines (or use the tool's summary statistics if shown).
Step 4: Count total lines from both texts combined.
Step 5: Divide unchanged by total. That ratio, multiplied by 100, approximates similarity percentage.
For small texts you can eyeball this; for larger documents the counting gets tedious and a dedicated similarity tool is better.
When Similarity Percentage Actually Matters
- Plagiarism review. A 90%+ match between a student paper and a source document is a strong plagiarism indicator.
- A/B copy testing. Comparing two marketing copy variants to ensure they are genuinely different, not superficially tweaked versions of the same thing.
- Translation QA. Ensuring two translations of the same source are substantially similar in meaning (and spotting where they diverge).
- Content deduplication. For content teams with thousands of documents, identifying near-duplicates at various similarity thresholds.
- Legal contract review. Similarity between template contracts and specific instances — highlighting where custom clauses were added.
When to Use a Dedicated Similarity Calculator
For quantitative research, legal discovery, or academic publication, a dedicated similarity tool is the right choice. These tools give you:
- Numeric percentage with defined algorithm.
- Word-level or character-level precision.
- Multiple similarity metrics (Jaccard, cosine, etc.).
- Reports you can cite or include in documentation.
For everyday "how similar are these two texts?" spot-checks, a free diff tool with visual estimation is faster and sufficient.
Estimate Text Similarity Now
Paste both texts, compare, count unchanged lines. Quick similarity estimate in under a minute.
Open Free Text Diff ToolFrequently Asked Questions
Can I get a similarity percentage from a diff tool?
Not directly — you estimate by counting unchanged vs changed lines. For a precise numeric percentage, use a dedicated similarity tool.
What counts as "similar enough" for plagiarism concerns?
Academic institutions typically flag at 15-25% similarity for papers, though context matters (quoting sources inflates similarity legitimately). For direct copy plagiarism, any sustained passage of identical text is a concern.
Does line-level similarity capture word changes?
No — a line that has one word changed still shows as different in line-level diff. For word-level comparison, specialized tools or editing software (Word compare, Google Docs revision) track individual word changes.
How accurate is visual estimation?
Accurate to within a few percentage points for most real-world texts. For research-grade precision, use a quantitative similarity tool.

