E-Book, Englisch, 187 Seiten
Reihe: Natural Computing Series
Pappa / Freitas Automating the Design of Data Mining Algorithms
1. Auflage 2009
ISBN: 978-3-642-02541-9
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
An Evolutionary Computation Approach
E-Book, Englisch, 187 Seiten
Reihe: Natural Computing Series
ISBN: 978-3-642-02541-9
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Data mining is a very active research area with many successful real-world app- cations. It consists of a set of concepts and methods used to extract interesting or useful knowledge (or patterns) from real-world datasets, providing valuable support for decision making in industry, business, government, and science. Although there are already many types of data mining algorithms available in the literature, it is still dif cult for users to choose the best possible data mining algorithm for their particular data mining problem. In addition, data mining al- rithms have been manually designed; therefore they incorporate human biases and preferences. This book proposes a new approach to the design of data mining algorithms. - stead of relying on the slow and ad hoc process of manual algorithm design, this book proposes systematically automating the design of data mining algorithms with an evolutionary computation approach. More precisely, we propose a genetic p- gramming system (a type of evolutionary computation method that evolves c- puter programs) to automate the design of rule induction algorithms, a type of cl- si cation method that discovers a set of classi cation rules from data. We focus on genetic programming in this book because it is the paradigmatic type of machine learning method for automating the generation of programs and because it has the advantage of performing a global search in the space of candidate solutions (data mining algorithms in our case), but in principle other types of search methods for this task could be investigated in the future.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Contents;8
3;Acronyms;12
4;Introduction;13
4.1;Rule Induction Algorithms;14
4.2;Evolutionary Computation;16
4.2.1;Genetic Programming;16
4.3;The Motivation for Automating the Design of Classification Algorithms;19
4.3.1;The Problem of the Selective Superiority of Classification Algorithms;19
4.3.2;Human Biases in Manually Designed Algorithms;22
4.3.3;A New Level of Automation in Data Mining;23
4.4;Overview of the Proposed Genetic Programming System;24
4.5;References;27
5;Data Mining;29
5.1;Introduction;29
5.2;The Classification Task of Data Mining;30
5.2.1;On Predictive Accuracy;31
5.2.2;On Overfitting and Underfitting;34
5.2.3;On the Comprehensibility of Discovered Knowledge;35
5.3;Decision Tree Induction;37
5.4;Rule Induction via the Sequential Covering Approach;39
5.4.1;Representation of the Candidate Rules;42
5.4.2;Search Mechanism;44
5.4.3;Rule Evaluation;46
5.4.4;Rule Pruning Methods;49
5.5;Meta-learning;51
5.5.1;Meta-learning for Classification Algorithm Selection;51
5.5.2;Stacked Generalization: Meta-learning via a Combination of Base Learners' Predictions;54
5.6;Summary;54
5.7;References;55
6;Evolutionary Algorithms;59
6.1;Introduction;59
6.2;An Overview of Evolutionary Algorithms;60
6.2.1;Individual Representation;60
6.2.2;Fitness Function;61
6.2.3;Individual Selection;62
6.2.4;Genetic Operators;62
6.3;Multiobjective Optimization;64
6.3.1;The Pareto Optimality Concept;65
6.3.2;Lexicographic Multiobjective Optimization;66
6.4;Genetic Programming Versus Genetic Algorithms: A Critical Perspective;67
6.5;Genetic Programming;71
6.5.1;Terminal and Function Sets and the Closure Property;74
6.5.2;Fitness Function: An Example Involving Regression;76
6.5.3;Selection and Genetic Operators;77
6.5.4;Approaches for Satisfying the Closure Property;80
6.5.5;Bloat;80
6.6;Grammar-Based Genetic Programming;82
6.6.1;Grammars;84
6.6.2;GGP with Solution-Encoding Individual;86
6.6.3;GGP with Production-Rule-Sequence-Encoding Individual;89
6.7;Summary;92
6.8;References;92
7;Genetic Programming for Classification and Algorithm Design;97
7.1;Introduction;97
7.2;Classification Models Versus Classification Algorithms;98
7.3;Genetic Programming for Evolving Classification Models;100
7.3.1;Evolving Classification Functions or Classification Rules;101
7.3.2;Evolving Decision Trees;103
7.4;Genetic Programming for Evolving Components of Rule Induction Algorithms;104
7.5;Genetic Programming for Evolving Classification Systems;107
7.6;Evolving the Design of Optimization Algorithms;109
7.6.1;Optimization Versus Classification;109
7.6.2;On Meta-heuristics and Hyper-heuristics;112
7.6.3;Evolving the Core Heuristic of Optimization Algorithms;113
7.6.4;Evolving an Evolutionary Algorithm for Optimization;116
7.7;Summary;117
7.8;References;118
8;Automating the Design of Rule Induction Algorithms;121
8.1;Introduction;121
8.2;The Grammar: Specifying the Building Blocks of Rule Induction Algorithms;123
8.2.1;The New Rule Induction Algorithmic Components in the Grammar;128
8.3;Individual Representation;129
8.4;Population Initialization;130
8.5;Individual Evaluation;133
8.5.1;From a Derivation Tree to Java Code;136
8.5.2;Single-Objective Fitness;138
8.5.3;Multiobjective Fitness;141
8.6;Crossover and Mutation Operations;143
8.7;Summary;145
8.8;References;145
9;Computational Results on the Automatic Design of Full Rule Induction Algorithms;148
9.1;Introduction;148
9.2;Evolving Rule Induction Algorithms Robust Across Different Application Domains;149
9.2.1;Investigating the GGP System's Sensitivity to Parameters;150
9.2.2;Comparing GGP-Designed Rule Induction Algorithms with Human-Designed Rule Induction Algorithms;153
9.2.3;To What Extent Are GGP-RIs Different from Manually Designed Rule Induction Algorithms?;155
9.2.4;Meta-training Set Variations;159
9.2.5;GGP System's Grammar Variations;162
9.2.6;GGP Versus Grammar-Based Hill-Climbing Search;164
9.2.7;MOGGP: A Multiobjective Version of the Proposed GGP;167
9.2.8;A Note on the GGP System's Execution Time;171
9.3;Evolving Rule Induction Algorithms Tailored to the Target Application Domain;172
9.3.1;Experiments with Public UCI Datasets;173
9.3.2;GGP-RIs Versus GHC-RIs;177
9.3.3;Experiments with Bioinformatics Datasets;178
9.3.4;A Note on the GGP System's Execution Time;183
9.4;Summary;184
9.5;References;185
10;Directions for Future Research on the Automatic Design of Data Mining Algorithms;187
10.1;Potential Improvements to the Current GGP System;188
10.1.1;Improving the Grammar;188
10.1.2;Modifying the GGP System's Fitness Function;189
10.2;Designing Rule Induction Algorithms Tailored to a Type of Dataset;190
10.3;Investigating Other Types of Search Methods for Automated Algorithm Design;191
10.4;Automatically Designing Other Types of Classification Algorithms;192
10.5;Automatically Designing Other Types of Data Mining Algorithms;193
10.6;References;194
11;Index;195




