So, How Bad is my Product Data?

As per a recent analysis in the HBR Only 3% of Companies’ data meets basic quality standards – and that’s for structured data! In aggregate, Bad data costs the U.S. $3 trillion per year.

Bad data costs us money, time, and customers. Not only is it a huge problem to recover and fix mistakes, but it reflects poorly on your business as bad customer service in the process.

Specific to online retail, our conservative estimate is that inaccurate product data costs retailers more than $100 billion every year.

Can these costs be monitored? Can data quality be measured? And most importantly, can these costs be reduced by automatically fixing and enriching catalogs using image and text extraction algorithms?

(1) Inaccurate product data costs over $100 Billion per year in online retail

  1. Product Returns – Online retail is expected to be a $4 trillion market by 2020. With 40% or more returns being due to inaccurate descriptions, mismatched color / size and product appearing different from the image on the online store, the costs of product returns due to inaccurate information is staggering.
  2. Mismatched Search Results – Most search engines fail to understand context behind shopper search terms. With about 30% of shoppers using site search to navigate, a search engine that understands consumer intent is key to increasing conversions by 200% or more. One of our previous blogs provides more context on this topic.
  3. Long product on-boarding cycle – Imagine Zara trying to onboard new product lines over a month. In a $2trillion online retail market, every month costs $100-200 billion in lost sales. This gets aggravated even further during the holiday season when large retailers like Walmart may need to on board 5-10 million new products onto their marketplace all at once!

(2) Bad product data suffers from 3 main problems in in online retail

(3) How much does bad product data really cost me as a company?

The SiriusDecisions 1-10-100 Rule by W. Edwards Deming provides a benchmark on bad data costs.

Let’s say it costs about $1 to verify a record as it is entered (typical outsourced costs), about $10 dollars to fix it later, and $100 if nothing is done, as the ramifications of the mistakes are felt repeatedly.

As an example, product data is estimated to be between 25-40% inaccurate (35% being typical) and let’s assume that a company has 1 million SKUs and is not actively attempting to clean and manage their data. That would mean in a good situation 100,000 records contain errors and fixing the problems will cost $1 million and ignoring the errors cost the company about $10 million. Note – that this ignores the initial costs in terms of time & money to add new data accurately at scale.

(4) Pretty much all bad product data is due to missing structure & standardization

  • Thousands of products being added to catalog every week with missing / wrong attributes
  • New product trends like “Jorts” or “Muddy jeans” can make things complicated

By building a living, breathing product catalog that uses AI to automatically remove label & enrich inaccuracies and enrich product content to account for shifts in consumer intent helps – improving accuracy, reducing time and mitigating cost of bad data.


Leave a comment


There are no comments here to display!