The recent credit risk crisis has made banks all over the world overly cautious. They are not only creating simpler and less intertwined financial instruments, but also building more sophisticated models to price the risk associated with their loans. Unfortunately, even with these precautions, defaults and thus losses are high. Through this contest, we would like to take a slightly different approach. We wish to build an algorithm that helps banks predict the amount they should expect to recover from a loan. At an aggregate level, this will enable banks to predict their top and bottom line gains from each portfolio or product, which is what matters most at the end of the day.

To experiment with this hypothesis we have used a public data set of loans and payments.


The objective of this competition is to develop an algorithm to predict the amount paid back on a loan based on the following information:

  • Amount funded by the lending body
  • Interest Rate
  • Loan Length
  • Loan Purpose
  • Monthly Payment
  • Debt to Income ratio of the loan seeker
  • Home Ownership status of the loan seeker
  • Monthly Income of the loan seeker
  • Fico Score of the loan seeker
  • Open Credit line of the loan seeker
  • Total Credit line of the loan seeker
  • Revolving line utilization percentage of loan seeker
  • Delinquency in last 2 year of the loan seeker
  • Employment Status of the loan seeker

Target Variable

Amount made from the loan by the lender.


All tools that could develop machine learning techniques and predictive modeling algorithms such as PMML, Java, Python, R, Rapidminer, WEKA, Octave, and SVMlight are welcome. A data dictionary with training dataset as well as testing dataset has been provided.

Solvers Expectation

Participants may submit one (1) entry every 24 hours of the competition period.  CrowdANALYTIX reserves the right to request that a participant submit the prediction algorithm associated with an entry to CrowdANALYTIX through the “Responses” tab on the contest page.  Once an entry is selected as eligible for a prize, the conditional winner must deliver the prediction algorithm’s code and documentation to CrowdANALYTIX for verification within 5 days. Documentation must be written in English and must be written so that individuals trained in computer science can replicate the winning results. Source code must contain a description of resources required to build and run the method.  

 Winner Selection Criteria

“Amt_Made” is the target variable. You have to submit an excel file with 2 columns CRID and predicted amount returned from the loan (“Amt_Made”). We will validate the submission with our dataset by calculating the RMSE score, where we will take the difference between predicted amount and the true amount, we will square the difference, take the mean of that squared difference and then the square root of that mean is calculated to arrive at the prediction error rate.. 


Submission Deadline : 21st May, 2012

Results announced by: 31st May, 2012


Prize money

One first prize of US$ 750

One second prize of US$500

One third prize of US$250

Latest Participants

Latest Entries

Forum Updates

Competition Administrator

Prize Type Amount in $ No.of Prizes

Solvers Entry Last Submission Leaderboard

Solvers Last Activity

Total Entries

Submission deadline has been passed! And you can no longer submit responses or edit your existing responses
Add an entry
File Uploaded Date
Add Topic
Topic Name Last Post Replies