An Interview with Dun & Bradstreet Chief Content Officer Monica Richter
In Anna Karenina, prolific Russian novelist Leo Tolstoy penned this famous quote to articulate the value of acceptance versus expending undue energy chasing idealistic goals. It’s a powerful reminder to appreciate what you have, and it’s the first thing I thought of when I sat down with Dun & Bradstreet Chief Content Officer Monica Richter to discuss the idea of data quality.
Companies invest considerable time and resources in the pursuit of flawless data and the resources needed to manage data quality. The truth is that large quantities of data will rarely, if ever, be 100% perfect. But less-than-perfect data can still be immensely valuable. Your business data quality should be an assessment of the information’s fitness to serve its purpose in a given context. The simplicity of the 100% barometer often distracts from more carefully considering the best way to use that data for a particular purpose. In other words, as Tolstoy advises, stop aiming for perfection.
It’s something you hear every day – companies not being satisfied with their data until they believe it is 100% perfect. Is that a realistic goal?
Monica Richter: For a rare use case, yes, but the proverbial search for the unicorn of 100% is a mathematical way to approach data quality, and it’s not a very satisfying journey for most businesses. Understanding what type of data quality will lead to a specific business outcome is key. Ninety-eight percent accuracy or coverage might be necessary for one goal, while for another it might be overkill if it takes an extra three months of critical time to realize that percentage. Measurement is all a calibration of each use case and the business decision or goal a company is trying to attain. In China, for example, we have been meticulous in understanding the landscape and focusing on our customers’ greatest interests by bringing forward those entities that are located in the eight highest-growth regions. Furthermore, we have expanded our capabilities using new infrastructure and resources to focus on millions of the most necessary records and making them available to our customers. This both enables better segmentation across the data and allows for financial analysis within a single record.
So, what does data quality mean to you?
MR: Data quality is interesting because it often means something different to each person. Data quality can be either perception or reality and can also be anecdotal. The interesting part for us at Dun & Bradstreet is understanding the specific use case and what problem it is trying to solve or story to tell from the insight. For this, a specific level of quality is required to meet a particular business need.
It is also often the case that data quality in the context of timeliness, accuracy, and completeness of data is not the real issue at all, but instead the quality of the information that describes data (i.e., the metadata) is the culprit. In an environment where information use is shifting more toward artificial intelligence and augmented intelligence (machine plus human), data quality practitioners must focus on quality of metadata as much as accuracy, timeliness, and completeness of the data itself. The lack of a solid metadata foundation is one of the primary causes of “bad data,” with symptoms such as widespread inconsistency and inaccuracy. Companies using poor quality data to train algorithms will also have magnifying quality burdens in their decision-making as the analytics misguide.
Lastly, data has multiple and increasing expectations of consistency in critical touchpoints for enterprise workflows. There needs to be a pre-determined common denominator on expected data quality, or else the interoperability across an organization’s systems will showcase poor quality data. The data will degrade with a multiplicative effect as it moves through the numerous data and analytical processes. This is where the notion of Master Data is so critical.
What are some important use cases for marketing, finance, and/or supply? What changes the data quality conversation?
MR: In the use case of marketing, if you’re trying to create a prospecting list for a campaign, the data quality of the segmentation is the foundational component of the process. For example, if an insurance company is trying to increase sales of a new product in the northern part of a country, the segmentation analysis should ignore companies located in the south. A focus on the data quality of the businesses records in the south would be an inefficient use of resources. However, if a business had recently moved to the south and was contacted, the business risk as result of the outdated address data is low.
When you look at data quality in another use case, let’s say in compliance, your measurement to a specificity of accuracy is key, because organizations have well-defined regulatory and legal commitments. In comparison to the marketing example, a business included in a supply chain on a sanctions list that went undetected would have significant risks. The data quality metrics and monitoring would be very different.
Is it fair to say that data quality is in the eye of the beholder based on who’s using it and what their end goals are?
MR: I’d say “yes and no.” Good data quality metrics allow you to measure those things that are critical for your business decisions. That should not just be in the eye of the beholder; it should be aligned to the business goal that is trying to be achieved. Companies should avoid scenarios in which a team member looks at a set of data quality metrics and comes to business decision A while another team member looks at the [same] set of metrics and come to business decision B. Business metrics should be a shared and communal understanding that align to a specific business goal.
In terms of specific use cases, yes, you will have stakeholders that define data quality according to a particular use case, which will ultimately vary from the use case of another stakeholder. Again, it’s a little bit of an “it depends” answer, but it always comes back to asking “What are you trying to get from the data?” More clearly, what insights are you trying to pull from the data? Do you have the key data elements and required quality of data to get you to that business decision?
Are there any important questions that need to be asked to determine data quality?
MR: We’ve discussed how data quality should align with fit-for-purpose expectations, but one must always ask, “Is the market changing and are these expectations up to date?” The bar for what is considered fit for purpose is continually rising. So, for example, while Dun & Bradstreet’s industry code accuracy results have traditionally trended around 98% within the US, our customers now have needs for more industry code depth. Dun & Bradstreet has embarked on an initiative to leverage deep learning and artificial intelligence to improve this data asset. Deep learning allows us to ingest and analyze more than 350 TB of website data for greater accuracy, depth, and speed of change than ever before. The application of this technology has allowed us to confidently improve more than 1.2 million industry codes to date.
Do you get a sense that companies are prioritizing quantity over quality when it comes to data?
MR: Well, there are use cases where bringing in more data is integral to uncovering patterns, linkages, and signals. I don’t want to dismiss a volume-based discussion, because you can aggregate, analyze, and get insight from large quantities of varied data. However, you can drown in pure quantity – especially if it’s insufficient in quality.
I agree, but isn’t it a challenge that sometimes companies don’t necessarily understand what data they need or what goals they need to solve for? If it appears their data is not 100% perfect, they begin feeling like it’s not going to help them achieve their goals. Is that fair to say?
MR: Exactly. The conversation is knowing what the company is trying to achieve, not the amount of data they have. In my opinion, it’s just a relevance conversation, maintaining sight of the company’s actual needs in balance with focusing its resources in the most important areas. Up until recently, our knowledge workers were required to comb through various sources to identify key business details. Now, through the use of Robotic Process Automation (RPA), we use technology to conduct some of the more routine research faster and more accurately.
Typically, companies will look to enrich their current data with third-party information to improve accuracy. What are some questions that companies need to ask as they incorporate third-party data into their existing datasets?
MR: I think companies must understand what’s core and non-core to their business. There will be data that they will be aggregating as unique and/or strategic to their business that is proprietary. This includes information and data about how their customers are reacting to new products, for example, as well as competitive information that needs to stay confidential. On the other hand, there’s such an abundance of data that companies don’t want to spend time sourcing, ingesting, and curating thousands of sources to get the needed insight. Instead, they will look to third parties to alleviate this tremendous work effort, so they can focus on what they do best. Third parties will typically master, cleanse, and maintain the data on an ongoing basis, since data decays. To maintain balance between data management requirements and data quality processes, companies are looking for well-maintained, well-governed, trusted data assets to drive their business decisions.
Identity management for blockchain is a great example. This transformative technology is disrupting trust-based models. However, identity details are not consistent across blockchain implementations and never will be. Cross referencing financial transactions across blockchains will be critical for companies, and Dun & Bradstreet is working hard to bring this mastering capability forward to our customers.