E-Book, Englisch, 446 Seiten
Kotu / Deshpande Predictive Analytics and Data Mining
1. Auflage 2014
ISBN: 978-0-12-801650-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Concepts and Practice with RapidMiner
E-Book, Englisch, 446 Seiten
ISBN: 978-0-12-801650-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)
Vijay Kotu is Vice President of Analytics at ServiceNow. He leads the implementation of large-scale data platforms and services to support the company's enterprise business. He has led analytics organizations for over a decade with focus on data strategy, business intelligence, machine learning, experimentation, engineering, enterprise adoption, and building analytics talent. Prior to joining ServiceNow, he was Vice President of Analytics at Yahoo. He worked at Life Technologies and Adteractive where he led marketing analytics, created algorithms to optimize online purchasing behavior, and developed data platforms to manage marketing campaigns. He is a member of the Association of Computing Machinery and a member of the Advisory Board at RapidMiner.
Autoren/Hrsg.
Weitere Infos & Material
1;Front Cover;1
2;Predictive Analyticsand Data Mining;4
3;Copyright;5
4;Dedication;6
5;Contents;8
6;Foreword;12
7;Preface;16
7.1;WHY THIS BOOK?;17
7.2;WHO CAN USE THIS BOOK?;17
8;Acknowledgments;20
9;Chapter 1 -Introduction;22
9.1;1.1 WHAT DATA MINING IS;23
9.2;1.2 WHAT DATA MINING IS NOT;26
9.3;1.3 THE CASE FOR DATA MINING;27
9.4;1.4 TYPES OF DATA MINING;29
9.5;1.5 DATA MINING ALGORITHMS;31
9.6;1.6 ROADMAP FOR UPCOMING CHAPTERS;32
9.7;REFERENCES;37
10;Chapter 2 - Data Mining Process;38
10.1;2.1 PRIOR KNOWLEDGE;40
10.2;2.2 DATA PREPARATION;43
10.3;2.3 MODELING;48
10.4;2.4 APPLICATION;53
10.5;2.5 KNOWLEDGE;55
10.6;WHAT’S NEXT?;56
10.7;REFERENCES;56
11;Chapter 3 - Data Exploration;58
11.1;3.1 OBJECTIVES OF DATA EXPLORATION;59
11.2;3.2 DATA SETS;59
11.3;3.3 DESCRIPTIVE STATISTICS;62
11.4;3.4 DATA VISUALIZATION;67
11.5;3.5 ROADMAP FOR DATA EXPLORATION;80
11.6;REFERENCES;81
12;Chapter 4 - Classification;84
12.1;4.1 DECISION TREES;85
12.2;4.2 RULE INDUCTION;109
12.3;4.3 K-NEAREST NEIGHBORS;120
12.4;4.4 NAÏVE BAYESIAN;132
12.5;4.5 ARTIFICIAL NEURAL NETWORKS;145
12.6;4.6 SUPPORT VECTOR MACHINES;155
12.7;4.7 ENSEMBLE LEARNERS;169
12.8;REFERENCES;183
13;Chapter 5 - Regression Methods;186
13.1;5.1 LINEAR REGRESSION;188
13.2;5.2 LOGISTIC REGRESSION;201
13.3;CONCLUSION;213
13.4;REFERENCES;213
14;Chapter 6 - Association Analysis;216
14.1;6.1 CONCEPTS OF MINING ASSOCIATION RULES;218
14.2;6.2 Apriori Algorithm;223
14.3;6.3 FP-GROWTH ALGORITHM;227
14.4;CONCLUSION;236
14.5;REFERENCES;236
15;Chapter 7 - Clustering;238
15.1;CLUSTERING TO DESCRIBE THE DATA;238
15.2;CLUSTERING FOR PREPROCESSING;239
15.3;7.1 TYPES OF CLUSTERING TECHNIQUES;240
15.4;7.2 K-MEANS CLUSTERING;244
15.5;7.3 DBSCAN CLUSTERING;255
15.6;7.4 SELF-ORGANIZING MAPS;263
15.7;REFERENCES;275
16;Chapter 8 - Model Evaluation;278
16.1;8.1 CONFUSION MATRIX (OR TRUTH TABLE);279
16.2;8.2 RECEIVER OPERATOR CHARACTERISTIC (ROC) CURVES AND AREA UNDER THE CURVE (AUC);281
16.3;8.3 LIFT CURVES;284
16.4;8.4 EVALUATING THE PREDICTIONS: IMPLEMENTATION;285
16.5;CONCLUSION;294
16.6;REFERENCES;294
17;Chapter 9 - Text Mining;296
17.1;9.1 HOW TEXT MINING WORKS;298
17.2;9.2 IMPLEMENTING TEXT MINING WITH CLUSTERING AND CLASSIFICATION;305
17.3;CONCLUSION;323
17.4;REFERENCES;323
18;Chapter 10 - Time Series Forecasting;326
18.1;10.1 DATA-DRIVEN APPROACHES;329
18.2;10.2 MODEL-DRIVEN FORECASTING METHODS;334
18.3;CONCLUSION;347
18.4;REFERENCES;348
19;Chapter 11 - Anomaly Detection;350
19.1;11.1 ANOMALY DETECTION CONCEPTS;350
19.2;11.3 DENSITY-BASED OUTLIER DETECTION;359
19.3;11.4 LOCAL OUTLIER FACTOR;362
19.4;CONCLUSION;365
19.5;REFERENCES;366
20;Chapter 12 - Feature Selection;368
20.1;12.1 CLASSIFYING FEATURE SELECTION METHODS;369
20.2;12.2 PRINCIPAL COMPONENT ANALYSIS;370
20.3;12.3 INFORMATION THEORY–BASED FILTERING FOR NUMERIC DATA;379
20.4;CATEGORICAL DATA;381
20.5;12.5 WRAPPER-TYPE FEATURE SELECTION;384
20.6;CONCLUSION;391
20.7;REFERENCES;391
21;Chapter 13 - Getting Started with RapidMiner;392
21.1;13.1 USER INTERFACE AND TERMINOLOGY;393
21.2;13.2 DATA IMPORTING AND EXPORTING TOOLS;398
21.3;13.3 DATA VISUALIZATION TOOLS;403
21.4;13.4 DATA TRANSFORMATION TOOLS;407
21.5;13.5 SAMPLING AND MISSING VALUE TOOLS;413
21.6;CONCLUSION;426
21.7;REFERENCES;427
22;Comparison of Data Mining Algorithms;428
23;Index;438
23.1;A;438
23.2;B;439
23.3;C;439
23.4;D;439
23.5;E;440
23.6;F;440
23.7;G;441
23.8;H;441
23.9;I;441
23.10;K;441
23.11;L;441
23.12;M;442
23.13;Q;442
23.14;R;442
23.15;S;443
23.16;T;443
23.17;U;444
23.18;V;444
23.19;W;444
23.20;Y;444
24;About the Authors;446
Data Mining Process
Abstract
Successfully uncovering patterns using data mining is an iterative process. Chapter 2 provides a framework to solve the data mining problem. The five-step process outlined in this chapter provides guidelines on gathering subject matter expertise; exploring the data with statistics and visualization; building a model using data mining algorithms; testing the model and deploying it in a production environment; and finally reflecting on new knowledge gained in the cycle. Over the years of evolution of data mining practices, different frameworks for the data mining process have been put forward by various academic and commercial bodies, like the Cross Industry Standard Process for Data Mining, knowledge discovery in databases, etc. These data mining frameworks exhibit common characteristics and hence we will be using a generic framework closely resembling the CRISP process.
Keywords
CRISP; KDD; data mining process; prior knowledge; modeling; data preparation; evaluation; application
Figure 2.1 CRISP data mining framework.
Figure 2.2 Data mining process.




