How do you accurately measure data quality?

Search Engine Algorithms need to understand the context behind each of the search terms which could lead to better results and bag conversions rates. Addition of structured attributes from the product content helps search engines to understand the context as well as consumer intent behind the search terms.

Since many retail organizations add these structured attributes manually using BPO firms or crowdsourcing firms like Mechanical Turks etc. Some of the largest retail corporations invest their own associates to manually validate the quality of enriched data provided by suppliers before the products are being on-boarded and made available for customers.

So how do we assess the data quality of product content being available for customers. There are three important steps that can help us assess the quality of data available within the product content.


Completeness defines attribute definition across all products as well as coverage of attributes values across all available fields within the product content. In other words Completeness is the measure of whether the data exists or not. For E.g. If we have an attribute “Pattern” for all products under product type dresses which contains around 75K products, however Pattern as an attribute is being filled only for 50K products then Pattern is 66.6% filled in. Now if we search “Polka dresses” in Macy’s what we would be probably looking for would be this:

But, what you end up getting is:

Placing efforts in capturing key data sets as well as attribute values during onboarding of products from product content from suppliers would be a re-warding exercise. Completeness of key attributes, as well as their values is very important and missing data is more than just a cost issue it’s a massive lost opportunity issue.


Accuracy states the piece of data is what it should be or else Accuracy is a percentage measurement of accuracy of the data that is present. We searched for “Denim Tops” on Nordstrom and got these results where some of the clothing tops were incorrectly classified for attribute Material as Denim which resulted into incorrect results.

Improving accuracy is a benefit for all, having more accurate attributes as well as values allows search engines to emit better results which also leads better customer experience as well as bag conversion.

Consistency & Conformity:

Consistency refers to data that is required to be in the same format for all the attributes in the product content and Conformity refers to a data which matches internal or external standard. Keeping standard data formats across product types as well maintaining attributes as well values in-line with the attribute names as well as value label used by external standards would help customers surfacing their day to day products easily.

For eg. in a search “Floral perfumes for women” on target results only 4 products, even “Floral perfumes” as a search term only returns 10 products in search results:

Values for “Scent” mentioned in the product page are inconsistent and do not confirm to standards; in addition scent may not be indexed for identification by the search engine

There would be certain attributes as well as values which will require consistency, others wise consistency may be measured for reliability reasons. Measuring conformity against internal standards can be improved by ensuring we adhere to add only those labels which are valid and for external conformity standards we can refer to the same product content available across retailers to define the list of standard attributes as well as their labels. 

So to wind up, as more and more products as well as categories are being on-boarded into E-Commerce platforms, attributes and their respective values  are critical to better understand consumer intent. And so it’s essential to asses catalog data quality regularly to measure Completeness, Accuracy and Consistency & Conformity parameters which help customers get better search results and Bag conversion rates.

This entire process of creating structured attributes and making them available automatically as soon as new products are onboarded can be achieved using our extensive library of AI-driven attribute extraction algorithms. Once our platform is implemented, the product catalog is always optimized and aligned with shifting consumer intent. Customers that have implemented this key piece of technology have been successful in increasing search conversions several times and reversing the exodus of customers.

Photo by Igor Ovsyannykov on Unsplash

Leave a comment


There are no comments here to display!