Simplifying the Account Matching Problem

Every enterprise today struggles to address a basic common business problem: “How can I integrate customer account information from multiple internal and external sources for my Sales and Marketing departments? Basically, I just want to enrich my existing CRM database with newly acquired customer data. How hard can that be?” Well, it turns out that it’s a lot harder than people anticipate. Let’s examine why and discuss how to solve this challenge.

The ubiquity of SaaS applications makes capturing lots of data about customers very simple. Easy customer sign-ups and information available from data brokers can deliver vast quantities of customer-related data to an existing enterprise. Potential customer names, emails, phone numbers, IP addresses, business, and home addresses, and even geolocation data can all be quickly acquired.

The challenge is how to take all of these raw, unfiltered, unqualified data points and relate/map them to your corporate sales repository in Salesforce or other CRM tools. Who does work for? Is phone number 425-555-1212 an existing customer? Which office/division of Acme Incorporated downloaded this whitepaper? Are they already in my CRM database? Although newly acquired data may have lots of data points, often  there is no common key between it and your enterprise CRM (account number, for example). There may or may not be partial matches based on the customer’s name, email, IP, office address, company name, etc. Attempting to match the newly acquired data with your CRM system manually is very resource intensive, time-consuming, and error-prone. Automating this process is not straightforward either because there are no available out-of-the-box fuzzy matching algorithms adequate to the task.

Most companies try to solve this challenge using a series of steps involving multiple tools, databases, and scripts. They start by buying an ETL tool to extract the relevant CRM data into a cloud data warehouse. Next, they write a series of customized fuzzy-logic scripts to attempt to match the CRM and potential customer data (often using multiple passes through the data) and then use a reverse ETL tool to update this information into the CRM system. The data ingestion and transformation processes require data engineers with specialized skill sets, making the Sales and Marketing groups dependent on available IT bandwidth and resources. This dependency typically adds months to the overall process, by which time the business requirements have changed, making the combined data less than useful. Even after the “final” data transformation, companies often quickly encounter issues with inaccurate matches and the inability to map records to children of parent enterprise account records automatically.

Dataworkz directly addresses the challenges of matching external data with CRM enterprise accounts when there is no common identifier between them. The common set of challenges are:

  • Cross join of datasets of modest size can result in a large number of records. E.g. two datasets with 300,000 records and 20,000 records will result in a cross join with 6 billion rows.
    • Dataworkz addresses this using a combination of elastic compute and cloud object stores for storing intermediate data.
  • The ability to clean the data and get better matches for the specific use case being solved.
    • Dataworkz provides more than 80 built in transformations including stop word removal.
  • Flexibility to use different algorithms to match one or many text fields.
    • Dataworkz provides multiple string matching algorithms which can be used together to determine matches.
      • Levenshtein
      • Cosine
      • Soundex
  • Ability to define multiple stages and mix and match additional steps to create the final matches.

The combined data set can be fed into your corporate CRM or stored in a scalable cloud-based data warehouse. Furthermore, Dataworkz data pipeline monitoring and lineage quickly identify any exceptions within the transformation process, thus ensuring high-quality results on an ongoing basis.

By enabling business users to define their own data pipelines and no-code transformations with built-in monitoring, Dataworkz removes the dependency on IT resources and bandwidth while ensuring an auditable, secure, reliable data transformation framework. The Dataworkz platform both simplifies and accelerates the process of account matching, enabling the quick turnaround and timely business insights needed by the Sales and Marketing departments.

Once you’ve successfully mapped the newly acquired customer information against your CRM database, you’re ready to tackle the next ask from your Sales and Marketing folks. Namely, which of these potential and/or trial customers are most likely to convert into paying accounts? In other words, can Dataworkz help me to integrate ML algorithms into the data transformation process so that I get both the combined results and some form of predictive modeling? The answer is, “Yes!” but that’s a topic for another blog post. Stay tuned for more.

Scroll to Top