A case study: Automated Product Catalog Classification to optimize search & discovery

Background: A significant part of e-tailers time, money, and effort goes in organizing the products they sell, understanding consumer behavior to better market their products, and determining which products to sell and how to make their catalogue relevant and searchable to provide better recommendations.

Millions of dollars are spent developing software that maintain information about products, buying history of users for particular products, etc. But as the catalogue size and no. of suppliers keep growing the problem of maintaining the catalogue accurately grows exponentially. Misclassification of products can lead to poor customer experience online (erroneous or irrelevant search results and recommendations). 

To address this partially, retailers rely on suppliers to provide accurate product classifications. This data is optimized further by a crowd of human classifiers to ensure products have the right product attributes and values. Although this approach may provide accurate classification if done right, it has some major disadvantages: 

  • Firstly, manual efforts are time-consuming, expensive and prone to human errors.
  • Secondly, product refreshes on retail stores happen too frequently for a manual classification process to keep up with. 
  • Also suppliers might not fill complete information for thousands of products and over time list of product grows of millions and scalability becomes major challenge. 

Given: The retail client had product catalog information like title, description, dimensions etc. which could vary from one product category to another. Our goal was to predict the attribute value for different product types. For example a book with attribute “Genre” can be classified as: Fiction, Non-Fiction, Travel, Business, Romance, Thriller etc. There could be thousands of such attributes for different product categories which makes the task of manual classification extremely hard & time consuming and can incur huge costs in maintaining such systems.

CrowdANALYTIX Approach: To address this challenge, we proposed an automated product attribute classification mechanism using advanced text based machine learning techniques using the given product features like title, description etc. & predicting product attribute values from the defined set of values.

We leveraged a global community of over 12000+ data scientists to build and maintain these classifiers in parallel for each of the product categories. We got the data scientists to compete against each other in private competitions held on our platform. Algorithms with the highest predictive power were selected from these competitions. The process involved the following key steps:

  • Pre-processing, Cleaning and Normalization of the internal product catalog data
  • We used public, verified data that served as the “ground truth” to enhance the quality of the data in terms of features, sample etc. and combining this with the internal data to build a comprehensive training dataset for training machine learning algorithms
  • Train the algorithm using Natural Language Processing techniques and multi-class classification algorithms
  • Once our models were trained, we ensured a consistent and automated way to predict values with precision - maintaining the quality of online product catalogs using additional test sets

Multiple approaches were evaluated for predicting the attribute values for different product category types. From the winning approach, we saw that combining internal and external data to build a comprehensive set of features enhanced the prediction accuracy. Further, using a combination of NLP feature extraction & multi-class classification helped improve the prediction accuracy considerably for different attribute values.

This process will both alleviate human labor and improve product categorization consistency in e-Commerce websites via

  • Optimized Algorithms. 
  • Reduced speed and cost. 
  • Maintained at Scale. 

For more details, please visit: https://www.crowdanalytix.com/product-catalog-optimization

Leave a comment

1 Comment

Guest 1 year ago

what you need is web-crawler with text pattern recognition system that can pick up information from the pages