What makes one phenomenal motorcycle racer so much better than everyone else?
Big data brings the biggest change to motorsports since… the motor. Sensors streaming data reveal new ways to understand the performance of both machines and athletes. With data visualization and predictive analytics we can learn why we perform as we do, and how to perform better.
So why are great athletes so much better than the rest of us? Working with John McGuinness, one of the greatest motorcycle racers of all time, and Adam Child, one of the foremost motorcycle journalists in the world, EMC is hosting a competition to answer that question using big data, with $7,500 in prizes on the line.
Speed and Safety
John and Adam’s interest in working with EMC goes beyond promoting motorcycle racing and their individual accomplishments. In a sport where riders routinely top 300 km/h and fractions of a second can determine the winner, safety is as important as speed. John is a 21-time winner at the Isle of Man TT, a punishing 38 mile track that has claimed 245 lives in its 107 year history. John knows that he is good at what he does but he has always asked why. What makes him not only faster but also safer? What does he do that the average racer doesn’t? And how can he use those answers to improve his sport?
Analyze and Win
We want to learn why John is so much better than the average motorcycle racer. What techniques and movements on the bike make him perform at such a high level? How can we use the data he and his motorcycle generate to make racing both faster and safer? We’re asking you to unlock this mystery. EMC gathered data on both John and Adam during a practice session in Spain, and stored it in an EMC Federation Business Data Lake. You’ll have access to this data. Your objective is to analyze and model it for both riders and use your data-driven insights to explain what makes one so fast and consistent. The most cogent and convincing analysis wins.
Learn more about the event, the racers, and EMC’s involvement with motorcycle racing here and please check back for future updates.
EMC is also sponsoring a separate data visualization contest as part of this event check that out here.
Data
A description of the contents of input data file is provided below.
- Bike Sensor data: engine, transmission, throttle, accelerometer, and gyroscope
- Biometric data: heart rate
- GPS data: three different sets of GPS readings to maximize accuracy
A data dictionary has been provided for all input data files. Data files and data dictionary can be found under the ‘Data’ tab (collectively “Data”). All Data is Confidential.
Expectations for Solvers
- Analyze the data. The data provided also offers the opportunity to perform cross-sectional and longitudinal analyses. You may need to perform data transformations in order to make sense of data and derive useful insights.
- Search for insights. You are free to apply any modelling or analysis of the given data to come up with evidence-based insights. There are many approaches possible for analyzing the data using supervised or unsupervised methods.
- Provide statistical evidence. We expect you to provide statistical evidence for your insights. Any modelling you perform should be supported by testing, and test results should be provided for model validation.
What’s not expected?
EMC does not expect you to simply come up with information based on the data. To win, it’s critical that you find the underlying drivers of performance. For example, let’s assume that the racer performed very well during a particular stretch on the race track. In this case, your challenge would be to identify what made him cover that stretch faster – based on the physical, mechanical and biometric data given – rather than just concluding that this faster performance resulted in his success.
Next Steps: What do you need to do to enter?
Prepare your entry Proposed Solution by following these steps:
Step 1: Prepare an approach note in the form of a PDF document which includes answers to the following questions. This approach note is a pre-requisite for anyone who wishes to submit a Proposed Solution. A submission format for Step 1 has been provided in file “Step1_Submission_format.doc” under the ‘Data’ tab.
- How you plan to approach the problem?
- What methodology do you plan to utilize to identify evidence-based insights?
- What kind of pre-processing do you plan to do on the data to derive features, if any?
- How would you ensure that the results are statistically relevant and valid?
- What references are behind your approach? Please provide them.
Approach notes will be included as part of evaluation to select winners. So it’s mandatory that you submit an approach note.
Step 2: Analyze data and come up with insights on what makes the racer so fast and consistent. To capture your work, you must submit the following deliverable:
- A detailed PDF report explaining your analysis, findings and conclusions. A submission format of this report has been provided in file “Step2_Submission_format.doc”
- Model or analysis code
- Results of model or analysis organized in an Excel or Word document
- Processed and Structured input data set used for analysis, including any transformations made on the given raw data (if different from the given input data)
Evaluation Criteria: How Will Proposed Solutions be judged?
- Your approach. How intuitive and innovative is it? Why your approach is better than other alternatives. 10%
- Your insights. To what extent did you present actionable insights (defined below)? 35%
- Your analysis. What is the statistical relevance and validity of modelling or analysis? 35%
- Your findings. How original are your findings? 20%
Based on EMC’s evaluation, a final aggregated score based on the categories above will be shown on the Leaderboard.
Prizes
Total Prize pool: USD 7,500
Top 3 submissions chosen by EMC will receive the following cash prize.
- 1st Prize: USD $5,000
- 2nd Prize: USD $2,000
- 3rd Prize: USD $500
Judges
Mike Foley is the Director of the Marketing Science Lab at EMC and leads the team in charge of predictive modelling and advanced analytics for the marketing organization there. Mike has over 20 years over experience working in data science, market intelligence and database marketing. Mike holds a Bachelors and MBA from University of California, Berkeley and is currently pursuing a Masters in Predictive Analytics from Northwestern University
Parag Chitalis brings over 25 years of leadership experience in Analytics and Value Chain Consulting. He is currently Senior Director, Advanced Analytics at VMWare in Palo Alto, CA. Most recently, he was Director at Dell Global Analytics (captive analytics center of excellence) in Bangalore, India for over 8 years, where he built an internal analytics team across multiple functions. Parag holds a Bachelor of Technology-Mechanical Engineering from IIT-Bombay and a Master of Science-Management Science from Case Western Reserve University, USA.
Joe Dery is a Sr. Data Scientist & Manager within Global Data Science Operations at EMC -- as well as an Adjunct Lecturer of Customer Data Analysis at Bentley University. At EMC, Joe has utilized cross-functional “big data”, industry-leading technologies, and the newest data science practices to solve some of EMC’s most complex business challenges. Joe is currently a PhD student in Business Analytics at Bentley University where he also holds a Masters in Marketing Analytics.
Anila Joshi is a Data Scientist in the Marketing Science Lab at EMC and works on predictive modelling, text mining, and visualizations as well as data architecture and integration. Anila has over 10 years of experience working in software engineering, product management, and data science. Anila holds a Masters degree in Computer Science.
Aeri Kim is a Data Scientist in the Marketing Science Lab at EMC and works on predictive modelling, advanced analytics, and visualizations. Aeri has over 10 years of experience in statistical modelling, market intelligence, and business analysis. Aeri holds a Masters in Statistics from the University of Illinois at Urbana-Champaign.
Tom Sheng is a Data Scientist in the Marketing Science Lab at EMC and works on predictive modelling, data analysis, and data mining. Tom has over 15 years of experience in actuarial science, data analysis, and statistical modelling. Tom holds a Masters and Ph.D. in Statistics from the University of Montana, Missoula.
Additional judges may be announced before the close of the contest. EMC may substitute judges with others of relevant backgrounds at EMC’s discretion and without notice.
Common Questions
What is an actionable insight? An insight that is derived from your analysis that can be used in practice to improve the performance of amateur riders.
Contest Rules and Disclaimer
NO PURCHASE OR PAYMENT NECESSARY TO ENTER OR WIN. VOID IN PUERTO RICO (AND USA TERRITORIES OTHER THAN D.C.), QUEBEC, ARGENTINA, AUSTRALIA, CHILE, NEW ZEALAND, PERU, IN COUNTRIES AND TERRITORIES EMBARGOED BY USA OR EU, AND WHERE PROHIBITED OR RESTRICTED BY LAW. MUST BE AGE OF MAJORITY IN YOUR STATE, PROVINCE OR COUNTRY OF PRIMARY RESIDENCE. GOVERNMENT EMPLOYEES OR OFFICIALS, AND EMPLOYEES OF EMC CORPORATION AND ITS AFFILIATES, ARE INELIGIBLE. YOUR PARTICIPATION MAY REQUIRE APPROVAL BY YOUR EMPLOYER. OTHER RESTRICTIONS, RULES AND EXCEPTIONS APPLY AS LISTED IN THE SECTION ENTITLED, “SUPPLEMENTAL EMC RULES.”
THE DATA PROVIDED, THE CONTEST STATEMENT, AND EMC RULES ARE CONFIDENTIAL, AND MAY ONLY BE USED IN CONNECTION WITH SUBMITTING A PROPOSED SOLUTION AND NO OTHER PURPOSE. YOUR PROPOSED SOLUTION SHOULD NOT CONTAIN ANY INTELLECTUAL PROPERTY, OR PROPRIETARY OR CONFIDENTIAL INFORMATION. BY PARTICIPATING IN THIS CONTEST YOU AGREE TO COMPLY WITH THIS CONTEST STATEMENT AND THE “SUPPLEMENTAL EMC RULES”. IF YOU DO NOT AGREE DO NOT PARTICIPATE OR SUBMIT AN ENTRY.