How to Find and Export Duplicate Records in a CSV — Review Before Removing
Table of Contents
Sometimes you do not just want to remove duplicates — you want to see them first. Maybe you need to audit what got flagged before deleting anything. Maybe the duplicate records have different data and you need to decide manually which version to keep. Maybe your boss wants a report of how many duplicates were in the data.
The CSV Deduplicator shows you every duplicate group before you commit to removing anything, and lets you download just the duplicate rows as a separate CSV file. Here is how to use that workflow.
How Duplicate Groups Are Displayed
After you click Find Duplicates, the tool shows a panel with every duplicate group. Each group contains:
- The row that will be kept (marked with a green "Keep" badge) — this is the first occurrence
- The rows that will be removed (marked with a red "Duplicate" badge) — these are subsequent occurrences
- The actual values from each row so you can see what the data looks like
Above the groups panel, a stats bar shows total rows, duplicate count, and unique rows remaining. This gives you the high-level picture before you look at individual groups.
There is a "Show All" toggle to expand all groups at once (useful for small files) or collapse them for large files where you only want to spot-check.
How to Download the Duplicate Rows Separately
After Find Duplicates runs, you have two download options:
Download Deduplicated CSV — the clean file with duplicates removed. This is the file you use going forward.
Download Duplicates Only — a CSV containing only the rows that were flagged as duplicates (the rows that would be removed). This is your audit file.
The duplicates-only download is useful for:
- Audit trails — keep a record of what was removed in case you need to restore a record
- Manual review — open the duplicates file, review the rows, and decide if any should be kept
- Reporting — show stakeholders how many duplicates were in the data and what they looked like
- Data completeness check — sometimes the "duplicate" row has data that the "kept" row is missing, and you want to manually merge them
When to Audit Before Removing
For most deduplication jobs — lead lists, contact imports, product catalogs — automated removal is fine. The first occurrence is kept, the rest are removed, done.
Audit before removing when:
The stakes are high. Customer records, financial data, medical records — if an incorrect removal causes a real problem, spend 10 minutes reviewing the groups.
The "duplicate" rows have different data. If you have two rows for "John Smith at Acme" but one has a phone number and the other has an email address, you might want to manually merge them rather than drop one. The duplicates-only download lets you identify these cases.
You are deduplicating on a fuzzy key. Company names, for example, normalize inconsistently. "Acme Corp" and "Acme Corporation" are NOT flagged as duplicates (that would require fuzzy matching). But if you are deduplicating on a normalized company name column, spot-check a few groups to make sure the matches are genuine.
For routine deduplication of fresh exports with no manual history, skip the audit and just download the clean file.
Using the Duplicates File to Improve Data Quality
Here is a workflow for data teams that need maximum data completeness rather than just duplicate removal:
- Run the deduplicator and download both the clean CSV and the duplicates CSV.
- Open the duplicates CSV. Add a column called "has_data_to_merge" and mark rows where the duplicate has data that the kept row is missing (a phone number, a job title, an address).
- For rows marked as needing a merge, manually update the corresponding row in the clean CSV with the missing data from the duplicate.
- Delete the "has_data_to_merge" column and import the final clean CSV.
This is more work than automated removal, but it produces a more complete dataset. Each unique contact has all the data from all their duplicate appearances combined into one record.
For very large datasets where manual merging is not practical, a Python script using the pandas groupby and first or combine_first methods can automate the merge-duplicate logic.
Try It Free — No Signup Required
Runs 100% in your browser. No data is collected, stored, or sent anywhere.
Open CSV DeduplicatorFrequently Asked Questions
Can I choose which row to keep from each duplicate group?
Not directly — the tool always keeps the first occurrence. To control which version is kept, sort your CSV before running deduplication so the preferred record appears first in each group. Sort by date descending to keep the most recent, or by completeness to keep the most complete.
How many duplicate groups will the panel show?
The panel shows all duplicate groups. For large files with many duplicates, there is a "Show All" / collapse toggle. The stats bar always shows the total count regardless of how many groups are expanded.
Is the duplicates-only download always available?
Yes — both the clean CSV and the duplicates-only CSV download buttons appear after Find Duplicates runs, as long as at least one duplicate was found.
If I download the duplicates CSV and add some rows back to my clean CSV, will the combined file have duplicates again?
If you add rows back from the duplicates file without removing the matching row from the clean file, yes — you will reintroduce duplicates. Make sure to delete the corresponding "kept" row from the clean file for any duplicates you are reinstating.

