dataX: crowdsourcing & technology platform for Retail

Following our last post related to large scale deployment, we received several questions about dataX™. Mainly these involved:

  • How does it work for millions of products?
  • How does it handle multiple product categories?
  • How could it scale for the several attributes and for hundreds, if not thousands of product categories - realistically?

So here’s a few more details:

(A) DATA: At the base level it’s about extracting and collating millions of data points.

This involves:

  • Extract product data at an item level
  • Using multiple data sources 
  • Normalizing the extracted data
  • Incorporating data from CPG, retail, consumer forums as well as trend data

(B) TRAIN: Use our community of 1000s of solvers to organize each category and data through private and public contests - using thousands of algorithms. Crowdsourcing is the optimal method to solve 

This involves:

  • Thousands of models: product category, attributes
  • Hundreds of Community solvers: for scale 
  • Multiple approaches for optimal results

(C) dataX: With each product category having multiple attributes, thousands of algorithms customized for each category & attribute such as color & weight are deployed for scale

This involves:

  • Secure REST APIs to integrate easily
  • Thousands of top models deployed for scale
  • Models maintained and fine-tuned to ensure accuracy over time

For Example


  • Images:

  • Title: e.g. Brand X, Garden Hose, 18 meters
  • Description: The Brand X garden hose is ultra-lightweight and compact, making it ideal for small spaces like patios and decks and easy to store in the garage during the cooler months. This hose is made with a patented new material that is not only incredibly durable, but fights kinking to help you maintain a lush flowerbed, lawn and garden without constantly backtracking to straighten it. The proprietary compound ensures that it lays straight, preventing pigtails and making uncoiling a breeze from the very first time you bring it home. The 18 meters length is perfect for a standard yard, giving you enough reach for corner plants without sacrificing performance or adding excess weight.


  • Product category: Garden Hose (from 1000+ product categories)
  • Attributes: Hose length, Part Number, etc.(From over 10,000+ attributes)
  • Attribute values: 18M, CMGUL12050CC (From over 100,0000+ attribute values)


To summarize:

(i) data for millions of products is collated using automated scripts at regular intervals 
(ii) Crowdsourcing to build thousands of models to classify product categories, applicable attributes and respective values
(iii) dataX to deploy models as REST APIs - maintained and tuned for precision

With the dataX™ platform - we are able to successfully automate the process of predicting product data for over 5 million products an hour through parallel processing using machine learning / deep learning models.  

Retail is an area that that’s tailor-made to leverage crowdsourcing and technology in data science. 

Whether it’s using images or unstructured product data to extract data such as color, shape patterns, text, the CrowdANALYTIX community of over 13,000 solvers and dataX™ platform provides a distinct advantage to retailers in the digital age. 

In the next post, we are going to show how a retailer can embrace predictive analytics in its operations to improve returns on digital investments.

Leave a comment


There are no comments here to display!