Remove Duplicate Studies From a Systematic Review — Free CSV Tool
Table of Contents
Systematic reviews and meta-analyses require searching multiple databases — PubMed, Embase, Cochrane, Scopus, Web of Science. The same study appears in multiple databases. Before you can screen titles and abstracts, you need to remove those cross-database duplicates.
Reference management tools like Zotero, Mendeley, and Endnote handle deduplication to some degree, but they are imperfect — they miss duplicates where the title is formatted slightly differently across databases, or where the DOI is absent from one export. If you export your references to CSV and want a clean, controllable deduplication, the CSV Deduplicator gives you that.
Exporting Search Results to CSV From Each Database
Most major research databases support CSV export. The process varies by platform:
- PubMed: Search results → Send to → File, choose CSV format. Or use the PubMed API with format=csv.
- Embase: Results page → Export → CSV. Includes title, authors, DOI, abstract, year.
- Scopus: Select all results → Export → CSV. Choose which fields to include — at minimum: Title, DOI, Authors, Year, Source.
- Web of Science: Export → Other file formats → Tab-delimited (convert to CSV) or direct CSV export.
Export the same fields from each database so the CSV columns align. At minimum: title, DOI, year, authors. DOI is the most reliable deduplication key when available. Title is the fallback for records without a DOI.
After exporting from each database, merge the CSVs into one file using the CSV Merger. Make sure column headers match before merging — rename any inconsistent headers using the CSV Column Mapper.
Deduplicating on DOI — The Most Reliable Method
DOI (Digital Object Identifier) is the most reliable deduplication key for research papers because it is unique per paper and consistent across databases. Two records with the same DOI are the same paper, period.
In the CSV Deduplicator, select your DOI column and run deduplication. Make sure "Ignore case" is on — DOIs are sometimes formatted with different capitalizations across databases (though they are technically case-insensitive).
The limitation: not all records have a DOI. Pre-2000 papers, conference abstracts, grey literature, and some thesis records often lack DOIs. DOI deduplication will catch the duplicates with DOIs and leave non-DOI records untouched — those still need title-based deduplication.
Run DOI deduplication first. Then take the output and run a second pass on title for records where DOI is blank.
Sell Custom Apparel — We Handle Printing & Free ShippingTitle-Based Deduplication — Handling Variations
Titles are less reliable than DOIs because they are formatted differently across databases. The same paper might appear as:
- "Effect of exercise on depression: a systematic review"
- "Effect of Exercise on Depression: A Systematic Review"
- "The effect of exercise on depression: a systematic review and meta-analysis"
The first two are the same paper with different capitalization — the CSV Deduplicator's case normalization catches this. The third is a different paper with a similar title — it correctly stays as a separate record.
For title deduplication, turn on "Ignore case" and "Trim whitespace". This catches capitalization and leading/trailing space variations. It will NOT catch partial title matches, subtitle differences, or typographical variations between databases.
Title deduplication is a second pass, not a replacement for DOI deduplication. Run both: first DOI, then title for the remaining records.
Why You Must Review Before Removing in Systematic Reviews
Systematic review methodology requires that deduplication decisions be documented and defensible. Automated removal without review is not acceptable for a published review — you need to show your deduplication process in the PRISMA flow diagram.
The CSV Deduplicator's duplicate groups panel shows you every flagged pair before you remove anything. Review each group:
- Confirm the records are genuinely the same study (same title, same year, same authors)
- Flag any that look like different studies matching on title alone
- Download the duplicates-only CSV as your audit trail
For your PRISMA diagram, the duplicates-only CSV gives you the exact count of records removed at the deduplication stage. Keep this file as part of your review documentation.
Also note: Zotero and other reference managers use more sophisticated deduplication that considers title, year, journal, and author similarity together. For the most complete deduplication, use a reference manager for the primary pass and the CSV Deduplicator for a secondary check on the exported records.
Getting the Duplicate Count for Your PRISMA Flow Diagram
The PRISMA 2020 checklist requires you to report the number of records removed as duplicates before screening. After running deduplication, the CSV Deduplicator's stats panel shows exactly this number.
Record:
- Total records identified across all databases
- Records removed as duplicates (from the stats panel after deduplication)
- Records remaining after deduplication (unique rows count)
These numbers feed directly into your PRISMA flow diagram at the "Identification" and "Screening" stages. The duplicate count from the tool is the exact number to report in your methods section.
Some reviewers also report deduplication separately by method (DOI-based vs title-based). Keep the duplicates-only CSV from each pass to count them separately if needed.
Try It Free — No Signup Required
Runs 100% in your browser. No data is collected, stored, or sent anywhere.
Open CSV DeduplicatorFrequently Asked Questions
Is this tool suitable for PRISMA systematic reviews?
It handles the CSV deduplication step well and provides an audit trail via the duplicates-only download. However, for full systematic review deduplication, it is best used as a secondary tool after a reference manager (Zotero, Endnote) has done a primary pass using multi-field similarity matching.
Can the tool match on DOI AND title together?
Yes. Select both columns and use "Match on ANY selected column" mode. A record is flagged as a duplicate if either the DOI matches OR the title matches (after normalization). This is broader than DOI-only and catches records without DOIs that share a title.
Does the tool normalize DOI format (e.g., with or without the "https://doi.org/" prefix)?
The case normalization handles DOIs in different capitalizations. It does not strip or normalize the URL prefix. If some records have "10.1000/xyz" and others have "https://doi.org/10.1000/xyz", these will NOT be matched. Standardize the DOI format in your CSV first using find-replace.
My CSV has 15,000 records from six databases. Will this work?
Yes. The tool handles tens of thousands of rows in the browser. 15,000 records will process in a few seconds.

