OperationsCRMAnalyticsBig Data Blog Talk to us ← designodin.com
← Systems Blog CRM & Sales

How to Clean CRM Data Before Migration: A Step-by-Step Guide | Netodin

· Designodin Systems

How to Clean CRM Data Before Migration: A Pre-Migration Cleansing Guide

Pre-migration data cleanup is three to 10 times cheaper than fixing the same records after import. Most teams skip it anyway.

The reasoning is understandable: the migration deadline is pressing, the new CRM is selected, the team is impatient to get started. “We’ll clean it up as we go” becomes the working plan. Six months later, the team is dealing with duplicate records, stale contacts, and mismatched fields in a system that’s already live and actively used — which makes every correction more complex and more disruptive.

The right time to clean your data is before it moves. This guide provides the framework to do it correctly.

Key Takeaways

  • A typical B2B CRM carries 10–30% duplicate records — cleaning these before migration prevents a new CRM that’s immediately compromised by the same data problems.
  • Best practice target: under 2% duplicate rate before any CRM import — a achievable goal that requires systematic deduplication, not just spot-checking.
  • Smaller clean databases outperform larger dirty ones — the decision to archive or delete low-quality records before migration improves the new CRM’s usability from day one.
  • Pre-migration cleanup for a database of 10,000–100,000 records typically takes two to four weeks — plan for this in your migration timeline, not as a parallel activity.

The Cost Argument for Cleaning Before You Migrate

Why Post-Migration Cleanup Costs Three to Ten Times More

When you clean data in a spreadsheet before migration, you work with simple tools (Excel, Google Sheets) on a single static file. The operations are straightforward: find duplicates, merge records, standardize values, delete bad records.

After migration, the same cleanup happens inside an active CRM with users working in it simultaneously. A merge changes which records related activities, deals, and notes are attached to — which may affect deal history, pipeline data, and reporting. A delete removes a record that may already have been updated by a rep. The cleanup is more complex, more risky, and more disruptive.

Beyond the technical complexity, there’s the adoption cost. Reps who find duplicate records or missing data in their new CRM on day one form a lasting negative impression of the system. Pre-migration cleanup prevents this.

What Dirty Data Does to the New CRM From Day One

  • Duplicate records mean reps work with incomplete relationship history
  • Stale contacts degrade email deliverability from the first campaign
  • Inconsistent company names prevent accurate account-level reporting
  • Missing required fields generate validation errors that slow data entry
  • Orphan records (contacts without companies, deals without contacts) create reporting gaps

None of these problems are insurmountable — but they all reduce the speed at which the new CRM becomes trusted and useful.

Step 1: Pre-Migration Data Audit

Before any cleanup work, understand what you’re working with.

Assess Total Records and Sources

Generate a count of every record type in every source:

  • Total contact records (including duplicates)
  • Total account/company records
  • Total deal/opportunity records
  • Any additional record types (leads, tasks, notes)

Also document all sources: which systems or spreadsheets these records come from. Multiple sources often contain overlapping records.

Identify Field Quality Issues

For each record type, assess field-by-field quality:

FieldRecords with dataRecords blankFormat issues
Email8,4001,600240 invalid format
Phone7,2002,800890 non-standard format
Company9,800200340 variations of same companies
Job title6,1003,900No standardization

This table identifies where cleanup effort is most needed.

Calculate Key Quality Metrics

  • Duplicate rate — how many records appear more than once?
  • Staleness rate — what percentage of records have had no activity in 24+ months?
  • Completeness score — what percentage of critical fields are populated?

These metrics establish your pre-migration baseline and your cleanup goals.

Decide What to Migrate vs. Archive vs. Delete

Before any cleanup work, make the migration scope decision:

Migrate: Records that are active, relevant, and likely to be engaged in the next 12 months.

Archive: Records that are inactive but may be relevant historically — keep accessible but not in the active CRM.

Delete: Records that are clearly obsolete — companies that no longer exist, contacts who have fully disengaged, duplicate records with no unique value.

The decision to archive and delete is uncomfortable for many teams (“what if we need that someday?”). The practical impact: a smaller database with higher quality records outperforms a large database with mixed quality across every CRM function — search, filtering, reporting, and emailing.

Step 2: Decide What Not to Migrate

Age-Based Archiving Rules

Records that haven’t been updated or engaged in three or more years should be archived rather than migrated, unless they belong to accounts that are still active. The contacts may still be valid, but the relationship context is too stale to be useful in the active CRM.

Apply this rule practically: any contact whose last interaction was more than 36 months ago and who isn’t associated with an active account goes to the archive.

Invalid or Bounced Email Addresses

Before migration, run your email list through a bulk email validation tool (NeverBounce, ZeroBounce, BriteVerify). Remove or flag records with hard-bounced or invalid email addresses. Migrating these contacts consumes CRM database capacity without providing any usable outreach value.

Typical finding: 8–15% of records in a moderately maintained database have invalid email addresses.

Orphan Records With No Activity

Contact records with no associated company, no activity history, and no pipeline history are candidates for deletion. They entered the CRM for some reason that’s no longer traceable and have generated no business value. Starting with these records in the new CRM creates noise without signal.

Step 3: Standardize Fields and Formats

Phone Number Formats

Choose a standard format for your region (e.g., +1 555 555 5555 for North America) and apply it to all records. This requires a find-and-replace or formula-based transformation in a spreadsheet.

Phone numbers in non-standard formats don’t prevent basic CRM function, but they create problems for click-to-dial integrations and phone-based data matching.

Company Name Normalization

The most impactful standardization: normalize company names to create consistent account matching. Run a frequency analysis — which company names appear more than once with slight variations? Create a normalization mapping (IBM Corp → IBM, International Business Machines → IBM) and apply it to all records.

This step is tedious but critical for accurate account-level reporting in the new CRM.

Job Title Standardization

Create a title mapping for common variations:

  • VP, Vice President, VP of Sales → VP of Sales
  • Director, Dir → Director
  • Sr., Senior → Senior

For large datasets, use a lookup table rather than individual find-and-replace operations.

Country, State, and Region Codes

For multi-geography businesses, standardize location fields to consistent values — ISO country codes, standard state/province codes. This enables geographic filtering and territory-based reporting in the new CRM.

Step 4: Deduplication

Fuzzy Matching on Name, Email, Phone, and Company

Email address is the gold-standard deduplication key — if two records have the same email, they’re the same person. Beyond email, use fuzzy matching on name + company combinations to identify likely duplicates with slightly different email addresses.

Tools for bulk deduplication:

  • Excel Power Query or Google Sheets VLOOKUP for small databases (under 5,000 records)
  • DemandTools, Dedupely, OpenRefine for larger datasets
  • Native CRM deduplication tools if the CRM has them (often sufficient for common duplicate patterns)

Survivorship Rules: Which Record Wins During Merge

When merging duplicate records, establish rules for which values survive:

  • Most recently updated record wins for most fields
  • Record with more complete fields wins for critical fields (email, phone)
  • Record with activity history wins if one has interactions logged and the other doesn’t
  • Manually reviewed for records where the rules produce an uncertain outcome

Document your survivorship rules before running deduplication. This creates an audit trail and ensures consistency.

Manual Review Queue for Ambiguous Cases

Not all duplicates are obvious. Create a manual review queue for cases where:

  • Same name but different companies (may be a job change, not a duplicate)
  • Same company but different contacts with similar roles (may be two different people)
  • Email addresses differ but name + phone match (may be the same person with a new email)

Assign someone to review this queue — typically two to four hours of work for a mid-sized database.

Target Threshold: Under 2% Duplicate Rate Before Import

Before migrating, verify your duplicate rate has been reduced below 2%. Above 2%, the duplicates will immediately undermine the accuracy of account-level views and pipeline reports.

Step 5: Data Enrichment Where Justified

When to Enrich vs. When to Delete Incomplete Records

For contacts missing critical fields (primarily email address), the decision:

  • Can you reasonably find this information through enrichment tools?
  • Is this contact valuable enough to justify the enrichment cost?

For a high-value contact at a key account with no email address, enrichment is worth it. For a contact at a company you haven’t engaged with in three years, deletion may be more efficient.

Tools for Email Validation and Company Data Append

  • Clearbit, ZoomInfo, Apollo — append company data (industry, size, website), job titles, and phone numbers
  • LinkedIn Sales Navigator — manual enrichment for priority contacts
  • NeverBounce, ZeroBounce — email validation and deliverability scoring

Run enrichment after deduplication (so you’re enriching the surviving record, not both duplicates).

Step 6: Validate Against Target CRM Field Requirements

Before the final import, validate that your cleaned data meets the target CRM’s requirements:

  • Required fields populated — does every record have the fields the new CRM requires?
  • Format validation — do email, phone, and URL fields meet the CRM’s format requirements?
  • Picklist values — if the CRM uses picklists (dropdowns) for fields like Industry or Lead Source, do your values match the picklist options? Mismatches will prevent import or create unmapped values.
  • Record relationships — do company-contact and deal-contact links resolve correctly?

Run a final test import of 100–200 records and verify the output before the full migration.

Timeline and Resources by Database Size

Small database (under 10,000 records): One to two weeks of cleanup work. One person working full time on data cleanup for 10 days can produce a clean, migration-ready dataset.

Mid-size database (10,000–100,000 records): Two to four weeks. Automated deduplication tools become necessary. Manual review queue for ambiguous duplicates may require two people for a week.

Large database (100,000+ records): Four to eight weeks. Requires dedicated resources and specialized data quality tools. Consider engaging a data quality consultant for this scope.

These timelines assume the cleanup is the primary focus, not a part-time background activity. Treating data cleanup as a parallel activity while the main migration project proceeds almost always results in the cleanup being compressed and incomplete.

Frequently Asked Questions

What if we don’t have time to clean before we migrate? Prioritize the highest-impact work: deduplication and email validation. These two steps prevent the most damaging data quality problems in the new CRM. If you genuinely can’t complete the full cleanup, migrate a subset of the data (most active accounts and contacts only) and add historical records after the full cleanup is done.

How do we handle records that are almost-duplicates but not exact matches? Apply your survivorship rules and merge them if the matching criteria (name + company, or phone + company) are strong enough. For cases where you genuinely aren’t sure, add both records to a “review” list in the new CRM and address them post-migration. Don’t let ambiguous duplicates block the migration entirely.

Should we migrate all historical notes and activity history? For notes and activities on active accounts and pipeline deals: yes. For historical activities on inactive contacts (notes from a trade show conversation five years ago): generally no. The volume and age of historical activity rarely justifies the migration complexity.

Clean Data Is a Foundation, Not an Afterthought

The new CRM is only as reliable as the data it starts with. Teams that invest in pre-migration cleanup spend their first 90 days using the CRM. Teams that skip cleanup spend their first 90 days managing data problems.

The choice is between two to four weeks of cleanup now or months of data remediation in an active production system. The math is clear.

Explore Netodin CRM Get CRM Migration and Data Cleansing Support

Stop managing tools. Start running your business.

Fully managed ERPNext, EspoCRM, Metabase, and DuckDB. 72-hour setup. Flat monthly pricing.