Remove CSV Duplicates Based on Multiple Columns — ALL or ANY Mode
Table of Contents
Single-column deduplication is straightforward: find rows where the email matches, keep one, remove the rest. Multi-column deduplication is more nuanced — and more powerful.
Sometimes you need two rows to match on BOTH email AND company name before calling them a duplicate. Sometimes you need them to match on email OR phone — either one is enough to flag it. The CSV Deduplicator handles both cases with its ALL/ANY matching mode toggle.
Understanding ALL Mode vs ANY Mode
When you select multiple columns to compare, the matching mode determines what counts as a duplicate:
ALL mode (Match on ALL selected columns): Two rows are duplicates only if EVERY selected column matches. Select email AND company — two rows are only flagged as duplicates if both the email AND the company name match. A row with the same email but a different company stays unique.
ANY mode (Match on ANY selected column): Two rows are duplicates if ANY one of the selected columns matches. Select email AND phone — two rows are flagged as duplicates if either the email matches OR the phone matches. This catches a contact that appears once with an email and once with only a phone number.
Use ALL mode when you need multiple criteria to confirm a duplicate. Use ANY mode when any single matching field is enough to call it a duplicate.
When Multi-Column Matching Is the Right Approach
B2B contact lists — one contact per role per company: Select company AND job_title in ALL mode. Two rows are duplicates only if the same job title appears at the same company. Keeps all contacts from a company, removes true duplicates (same person entered twice).
Lead lists with both email and phone: Select email AND phone in ANY mode. A contact appearing with an email in one source and a phone number in another source gets flagged as a duplicate on the phone match, even if the email fields are different.
Transaction log — no duplicate transactions on same date and amount: Select transaction_date AND amount AND account_id in ALL mode. Only removes rows where all three match — a person paying the same amount on the same day from the same account. Legitimate repeat transactions (same amount, different dates) are not flagged.
Product catalog — unique by SKU and supplier: Select sku AND supplier_id in ALL mode. The same SKU from different suppliers stays (they are different products). The same SKU from the same supplier is a duplicate.
Sell Custom Apparel — We Handle Printing & Free ShippingStep-by-Step: Setting Up Multi-Column Matching
Open the CSV Deduplicator and load your CSV. After the file parses, you see checkboxes for every column.
- Check the columns you want to match on. Click each column checkbox that should be part of the uniqueness definition.
- Select the matching mode. Below the column checkboxes, choose "Match on ALL selected columns" or "Match on ANY selected column".
- Verify the normalization options. Case, spacing, and phone normalization are on by default. Adjust as needed.
- Click Find Duplicates.
The duplicate groups panel shows each set of matching rows. For multi-column matching, the group header shows the matching values for each selected column — so you can see exactly why the tool flagged those rows as duplicates.
Review a few groups to confirm the logic is working as intended. Then download the deduplicated CSV.
Edge Cases to Watch For
Empty values in one of the selected columns: In ALL mode, if one column is empty for a row, it will only match another row that also has an empty value in that column. An empty email does not match a populated email. In ANY mode, a row with no email but a matching phone will still be flagged as a duplicate on the phone match.
Too many false positives in ANY mode: If you select five columns in ANY mode, a row is flagged as a duplicate if it shares ANY one value with another row. With a column like "City" selected in ANY mode alongside email, everyone in Austin who has any other match gets flagged. ANY mode works best when selected columns are all genuinely unique identifiers (email, phone, SSN, transaction ID) — not categorical values like city or status.
The same entity with slightly different company names: "Acme Corp" and "Acme Corporation" do not normalize to the same string — normalization only handles case and whitespace, not abbreviation differences. Multi-column matching does not solve fuzzy name matching.
Real Example: Cleaning a Multi-Source B2B Lead List
You collected leads from three sources: a LinkedIn scrape (has email + company), a trade show scan (has phone + name, no email), and a webinar registration (has email + job title). You merged them into one 2,000-row CSV.
Goal: remove duplicates while keeping the most complete record for each unique person.
Step 1: Sort the CSV so webinar registrations are first (most complete data — email + job title), then LinkedIn, then trade show.
Step 2: Open the deduplicator. Select "email" and "phone" in ANY mode. This flags rows that share an email with another row, OR share a phone number — catching cross-source duplicates even when one source captured email and the other only captured phone.
Step 3: Review the duplicate groups. Check that "match on phone" groups are legitimate duplicates, not two different people who share a company main line.
Step 4: Download the clean CSV. You now have one record per person, with the most complete version kept (because you sorted by completeness first).
Try It Free — No Signup Required
Runs 100% in your browser. No data is collected, stored, or sent anywhere.
Open CSV DeduplicatorFrequently Asked Questions
Can I select more than two columns to match on?
Yes. Select as many columns as you need. In ALL mode, all selected columns must match. In ANY mode, one match on any selected column is enough to flag the row.
What if I want to deduplicate on email but also check phone as a secondary identifier?
Run two separate passes. First deduplicate on email alone (removes email duplicates). Then take the output and run it through again on phone alone (removes phone duplicates where email was not present). The combined result removes both types of duplicates.
Does multi-column matching slow down the deduplication?
Not noticeably for typical CSV file sizes. The matching algorithm is O(n) for single-column and O(n) with slightly more memory for multi-column. You will not see a meaningful speed difference for files under 100,000 rows.
I selected three columns in ALL mode but the duplicate count seems too low — why?
ALL mode requires every selected column to match. If your data has any variation in one column between otherwise identical rows (slightly different company name, missing phone for some rows), those rows will not be flagged. Try reducing the number of selected columns, or switch to ANY mode to cast a wider net.

