Applications in Data Mining
Translating data into business insights is generally the most importantand most valuablestep in the data ecosystem. This is where data meet their use and user. Great analysis can squeeze insights even from mediocre data, while conversely, bad analysis can destroy them potential value of high-quality data. The business case for Advance Analytics/Data Mining must always be business-driven and not technology-driven. The DM initiative as a whole, and the proposed Business Intelligence Technology application specifically, should support the business strategy and business objectives. Therefore, the proposed DM application must reduce measurable business pain (problems affecting the profitability or efficiency of an organization) in order to justify building the application. If DM applications are built without a good business justification, management will most likely not support the effort. Objectives Our capabilities of both generating and collecting data have been increasing rapidly. Contributing factors include the computerization of business, scientific, and government transactions; the widespread use of digital cameras, publication tools, and bar codes for most commercial products; and advances in data collection tools ranging from scanned text and image platforms to satellite remote sensing systems. Where in the data mining process do humans like you and me add the most value? Is it in exploring the data to uncover anomalies and to fathom the relationships between elements? Is it in selecting transformations for the elements to improve their representations for modeling and analysis? Or, is it in building very sophisticated, nonlinear models that predict a future outcome given currently available data? Some data scientists insist that you and I contribute the greatest value by the way we frame the problem. You can do a stellar job of exploring the data, transforming it, and building a model but if the problem is framed poorly, its all a pointless exercise. Youre solving the wrong problem. Conversely, though, even a mediocre model – applied to the right, well-framed problem – will provide immediate benefit to your organization. Students will accomplish these objectives after completing the assignment: What attributes of the contestants were most closely associated to making it to the Finals? How about some other fun facts? Who does the model predict for Dancing with the Stars Does that mean that, before the dancers ever take to the floor, we can pretty well predict two of the three finalists? How well were we able to predict Final 3 contestants? What attributes have a negative correlation to making the Finals? What attributes didnt seem to matter? Instructions Applications in Data Mining: Predict the outcome of Dancing with the Stars (The All-Star Championship Edition) Is it possible to predict the finalists on Dancing with the Stars before the contestants ever step on the floor? Is being young and beautiful a sure ticket into the Top 3? Do Derek Hough and Cheryl Stephanie Burke have the best track records for guiding their partners to the top? This month we will do a tribute to some past WINNERS! Yes, thats correct. I will give you a list of 10 past winners and their partners (all randomly chosen and you will have to build a model and predict a WINNER. All the information you need can be found on DWTS Wikipedia page. Each season is dealt with separately, so you should have detail information to build a case. I am looking for your creativity and how much time you dedicated to this assignmentthat will be obvious from the deliverable. A great way to start your analysis is by pulling together data from the 21 previous seasons of DWTS on that seasons winner. The show has been around for some time now and knowing the show and what has happened in the past, is the best way to build your predictive model. Start collecting the characteristics of the contestants that your intuition (our your gut) says might be important: age, professional partner, how well-known contestants were, etc. But you also wanted to check out aspects of the contestants that might not seem to be relevant at first, like gender and occupation and marital status. As you begin to think about the attributes to put into your model, you may want to train your model using a fraction of the results of the first seventeen seasons, looking only at the contestants characteristics — no dancing scores — and their finishing place on the show. Summarized your findings, keep in mind that statistics and predictions can also be used for fun, and trying to predict the outcome of Dancing with the Stars was too good to pass up! Assignment Goal What attributes of the contestants were most closely associated to making it to the Finals? What attributes didnt seem to matter? What attributes have a negative correlation to making the Finals? How well were we able to predict Final 3 contestants? Does that mean that, before the dancers ever take to the floor, we can pretty well predict two of the three finalists? Who does the model predict for The All-Star Championship Edition? How about some other fun facts? Provide a matrix with each contestant and the categories used to arrive with their raw data (profile) used for scoring. Conclusion: With the analytic and predictive tools of data mining, can you make a reasonable prediction with the model you constructed? This model will not be judge from the perspective of right or wrong. This assignment will hopefully give you a flavor of how data can be a great predictor of future outcome of events, not only in the business but also in a reality TV show. You can use any data, approach or analysis technique to arrive at your conclusion. Im looking for your application of the material you learned this month.