E-Book, Englisch, 664 Seiten
Witten / Frank / Hall Data Mining
3. Auflage 2011
ISBN: 978-0-08-089036-4
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Practical Machine Learning Tools and Techniques
E-Book, Englisch, 664 Seiten
ISBN: 978-0-08-089036-4
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. The book is targeted at information systems practitioners, programmers, consultants, developers, information technology managers, specification writers, data analysts, data modelers, database R&D professionals, data warehouse engineers, data mining professionals. The book will also be useful for professors and students of upper-level undergraduate and graduate-level data mining and machine learning courses who want to incorporate data mining as part of their data management knowledge base and expertise. - Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects - Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods - Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization
Ian H. Witten is a professor of computer science at the University of Waikato in New Zealand. He directs the New Zealand Digital Library research project. His research interests include information retrieval, machine learning, text compression, and programming by demonstration. He received an MA in Mathematics from Cambridge University, England; an MSc in Computer Science from the University of Calgary, Canada; and a PhD in Electrical Engineering from Essex University, England. He is a fellow of the ACM and of the Royal Society of New Zealand. He has published widely on digital libraries, machine learning, text compression, hypertext, speech synthesis and signal processing, and computer typography.
Autoren/Hrsg.
Weitere Infos & Material
1;Front cover;1
2;Data Mining: Practical Machine Learning Tools and Techniques;2
3;Copyright page;5
4;Table of contents;6
5;List of Figures;16
6;List of Tables;20
7;Preface;22
7.1;Updated and revised content;26
8;Acknowledgments;30
9;About the Authors;34
10;PART I: Introduction to Data Mining;36
10.1;Chapter 1: What’s It All About?;38
10.1.1;Data mining and machine learning;38
10.1.2;Simple examples: the weather and other problems;44
10.1.3;Fielded applications;56
10.1.4;Machine learning and statistics;63
10.1.5;Generalization as search;64
10.1.6;Data mining and ethics;68
10.1.7;Further reading;71
10.2;Chapter 2: Input: Concepts, Instances, and Attributes;74
10.2.1;What’s a concept?;75
10.2.2;What’s in an example?;77
10.2.3;What’s in an attribute?;84
10.2.4;Preparing the input;86
10.2.5;Further reading;95
10.3;Chapter 3: Output: Knowledge Representation;96
10.3.1;Tables;96
10.3.2;Linear models;97
10.3.3;Trees;99
10.3.4;Rules;102
10.3.5;Instance-based representation;113
10.3.6;Clusters;116
10.3.7;Further reading;118
10.4;Chapter 4: Algorithms: The Basic Methods;120
10.4.1;InFerring rudimentary rules;121
10.4.2;Statistical modeling;125
10.4.3;Divide-and-conquer: constructing decision trees;134
10.4.4;Covering algorithms: constructing rules;143
10.4.5;Mining association rules;151
10.4.6;Linear models;159
10.4.7;Instance-based learning;166
10.4.8;Clustering;173
10.4.9;Multi-instance learning;176
10.4.10;Further reading;178
10.4.11;Weka implementations;180
10.5;Chapter 5: Credibility: Evaluating What’s Been Learned;182
10.5.1;Training and testing;183
10.5.2;Predicting performance;185
10.5.3;Cross-validation;187
10.5.4;Other estimates;189
10.5.5;Comparing data mining schemes;191
10.5.6;Predicting probabilities;194
10.5.7;Counting the cost;198
10.5.8;Evaluating numeric prediction;215
10.5.9;Minimum description length principle;218
10.5.10;Applying the MDL principle to clustering;221
10.5.11;Further reading;222
11;Part 2: Advanced Data Mining;224
11.1;Chapter 6: Implementations: Real Machine Learning Schemes;226
11.1.1;Decision trees;227
11.1.2;Classification rules;238
11.1.3;Association rules;251
11.1.4;Extending linear models;258
11.1.5;Instance-based learning;279
11.1.6;Numeric prediction with local linear models;286
11.1.7;Bayesian networks;296
11.1.8;Clustering;308
11.1.9;Semisupervised learning;329
11.1.10;Multi-instance learning;333
11.1.11;Weka implementations;338
11.2;Chapter 7: Data Transformations;340
11.2.1;Attribute selection;342
11.2.2;Discretizing numeric attributes;349
11.2.3;Projections;357
11.2.4;Sampling;365
11.2.5;Cleansing;366
11.2.6;Transforming multiple classes to binary ones;373
11.2.7;Calibrating class probabilities;378
11.2.8;Further reading;381
11.2.9;Weka implementations;383
11.3;Chapter 8: Ensemble Learning;386
11.3.1;Combining multiple models;386
11.3.2;Bagging;387
11.3.3;Randomization;391
11.3.4;Boosting;393
11.3.5;Additive regression;397
11.3.6;Interpretable ensembles;400
11.3.7;Stacking;404
11.3.8;Further reading;406
11.3.9;Weka implementations;407
11.4;Chapter 9: Moving on: Applications and Beyond;410
11.4.1;Applying data mining;410
11.4.2;Learning from massive datasets;413
11.4.3;Data stream learning;415
11.4.4;Incorporating domain knowledge;419
11.4.5;Text mining;421
11.4.6;Web mining;424
11.4.7;Adversarial situations;428
11.4.8;Ubiquitous data mining;430
11.4.9;Further reading;432
12;PART III: The Weka Data Mining Workbench;436
12.1;Chapter 10: Introduction to Weka;438
12.1.1;What’s in weka?;438
12.1.2;How do you use it?;439
12.1.3;What else can you do?;440
12.1.4;How do you get it?;441
12.2;Chapter 11: The Explorer;442
12.2.1;Getting started;442
12.2.2;Exploring the explorer;451
12.2.3;Filtering algorithms;467
12.2.4;Learning algorithms;480
12.2.5;Metalearning algorithms;509
12.2.6;Clustering algorithms;515
12.2.7;Association-rule learners;520
12.2.8;Attribute selection;522
12.3;Chapter 12: The Knowledge Flow Interface;530
12.3.1;Getting started;530
12.3.2;Components;533
12.3.3;Configuring and connecting the components;535
12.3.4;Incremental learning;537
12.4;Chapter 13: The Experimenter;540
12.4.1;Getting started;540
12.4.2;Simple setup;545
12.4.3;Advanced setup;546
12.4.4;The analyze panel;547
12.4.5;Distributing processing over several machines;550
12.5;Chapter 14: The Command-Line Interface;554
12.5.1;Getting started;554
12.5.2;The structure of weka;554
12.5.3;Command-line options;561
12.6;Chapter 15: Embedded Machine Learning;566
12.6.1;A simple data mining application;566
12.7;Chapter 16: Writing New Learning Schemes;574
12.7.1;An example classifier;574
12.7.2;Conventions for implementing classifiers;590
12.8;Chapter 17: Tutorial Exercises for the Weka Explorer;594
12.8.1;Introduction to the explorer interface;594
12.8.2;Nearest-neighbor learning and decision trees;601
12.8.3;Classification boundaries;606
12.8.4;Preprocessing and parameter tuning;609
12.8.5;Document classification;613
12.8.6;Mining association rules;617
13;References;622
14;Index;642