Buch, Englisch, 736 Seiten, mit 1 CD-ROM, Format (B × H): 161 mm x 240 mm, Gewicht: 1285 g
Buch, Englisch, 736 Seiten, mit 1 CD-ROM, Format (B × H): 161 mm x 240 mm, Gewicht: 1285 g
ISBN: 978-0-470-17081-6
Verlag: Wiley
Wiley Series in Bioinformatics: Computational Techniques and Engineering
Yi Pan and Albert Y. Zomaya, Series Editors
Wide coverage of traditional unsupervised and supervised methods and newer contemporary approaches that help researchers handle the rapid growth of classification methods in DNA microarray studies
Proliferating classification methods in DNA microarray studies have resulted in a body of information scattered throughout literature, conference proceedings, and elsewhere. This book unites many of these classification methods in a single volume. In addition to traditional statistical methods, it covers newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, swarm intelligence involving particle swarm optimization, and more.
Classification Analysis of DNA Microarrays provides highly detailed pseudo-code and rich, graphical programming features, plus ready-to-run source code. Along with primary methods that include traditional and contemporary classification, it offers supplementary tools and data preparation routines for standardization and fuzzification; dimensional reduction via crisp and fuzzy c-means, PCA, and non-linear manifold learning; and computational linguistics via text analytics and n-gram analysis, recursive feature extraction during ANN, kernel-based methods, ensemble classifier fusion.
This powerful new resource: - Provides information on the use of classification analysis for DNA microarrays used for large-scale high-throughput transcriptional studies
- Serves as a historical repository of general use supervised classification methods as well as newer contemporary methods
- Brings the reader quickly up to speed on the various classification methods by implementing the programming pseudo-code and source code provided in the book
- Describes implementation methods that help shorten discovery times
Classification Analysis of DNA Microarrays is useful for professionals and graduate students in computer science, bioinformatics, biostatistics, systems biology, and many related fields.
Fachgebiete
Weitere Infos & Material
Preface xix
Abbreviations xxiii
1 Introduction 1
1.1 Class Discovery 2
1.2 Dimensional Reduction 4
1.3 Class Prediction 4
1.4 Classification Rules of Thumb 5
1.5 DNA Microarray Datasets Used 9
References 11
Part I Class Discovery 13
2 Crisp K-Means Cluster Analysis 15
2.1 Introduction 15
2.2 Algorithm 16
2.3 Implementation 18
2.4 Distance Metrics 20
2.5 Cluster Validity 24
2.5.1 Davies–Bouldin Index 25
2.5.2 Dunn’s Index 25
2.5.3 Intracluster Distance 26
2.5.4 Intercluster Distance 27
2.5.5 Silhouette Index 30
2.5.6 Hubert’s Statistic 31
2.5.7 Randomization Tests for Optimal Value of K 31
2.6 V-Fold Cross-Validation 35
2.7 Cluster Initialization 37
2.7.1 K Randomly Selected Microarrays 37
2.7.2 K Random Partitions 40
2.7.3 Prototype Splitting 41
2.8 Cluster Outliers 44
2.9 Summary 44
References 45
3 Fuzzy K-Means Cluster Analysis 47
3.1 Introduction 47
3.2 Fuzzy K-Means Algorithm 47
3.3 Implementation 49
3.4 Summary 54
References 54
4 Self-Organizing Maps 57
4.1 Introduction 57
4.2 Algorithm 57
4.2.1 Feature Transformation and Reference Vector Initialization 59
4.2.2 Learning 60
4.2.3 Conscience 61
4.3 Implementation 63
4.3.1 Feature Transformation and Reference Vector Initialization 63
4.3.2 Reference Vector Weight Learning 66
4.4 Cluster Visualization 67
4.4.1 Crisp K-Means Cluster Analysis 67
4.4.2 Adjacency Matrix Method 68
4.4.3 Cluster Connectivity Method 69
4.4.4 Hue–Saturation–Value (HSV) Color Normalization 69
4.5 Unified Distance Matrix (U Matrix) 71
4.6 Component Map 71
4.7 Map Quality 73
4.8 Nonlinear Dimension Reduction 75
References 79
5 Unsupervised Neural Gas 81
5.1 Introduction 81
5.2 Algorithm 82
5.3 Implementation 82
5.3.1 Feature Transformation and Prototype Initialization 82
5.3.2 Prototype Learning 83
5.4 Nonlinear Dimension Reduction 85
5.5 Summary 87
References 88
6 Hierarchical Cluster Analysis 91
6.1 Introduction 91
6.2 Methods 91
6.2.1 General Programming Methods 91
6.2.2 Step 1: Cluster-Analyzing Arrays as Objects with Genes as Attributes 92
6.2.3 Step 2: Cluster-Analyzing Genes as Objects with Arrays as Attributes 94
6.3 Algorithm 96
6.4 Implementation 96
6.4.1 Heatmap Color Control 96
6.4.2 User Choices for Clustering Arrays and Genes 97
6.4.3 Distance Matrices and Agglomeration Sequences 98
6.4.4 Drawing Dendograms and Heatmaps 104
References 105
7 Model-Based Clustering 107
7.1 Introduction 107
7.2 Algorithm 110
7.3 Implementation 111
7.4 Summary 116
References 117
8 Text Mining: Document Clustering 119
8.1 Introduction 119
8.2 Duo-Mining 119
8.3 Streams and Documents 120
8.4 Lexical Analysis 120
8.4.1 Automatic Indexing 120
8.4.2 Removing Stopwords 121
8.5 Stemming 121
8.6 Term Weighting 121
8.7 Concept Vectors 124
8.8 Main Terms Representing Concept Vectors 124
8.9 Algorithm 125
8.10 Preprocessing 127
8.11 Summary 137
References 137
9 Text Mining: N-Gram Analysis 139
9.1 Introduction 139
9.2 Algorithm 140
9.3 Implementation 141
9.4 Summary 154
References 156
Part II Dimension Reduction 159
10 Principal Components Analysis 161
10.1 Introduction 161
10.2 Multivariate Statistical Theory 161
10.2.1 Matrix Definitions 162
10.2.2 Principal Component Solution of R 163
10.2.3 Extraction of Principal Components 164
10.2.4 Varimax Orthogonal Rotation of Components 166
10.2.5 Principal Component Score Coefficients 168
10.2.6 Principal Component Scores 169
10.3 Algorithm 170
10.4 When to Use Loadings and PC Scores 170
10.5 Implementation 171
10.5.1 Correlation Matrix R 171
10.5.2 Eigenanalysis of Correlation Matrix R 172
10.5.3 Determination of Loadings and Varimax Rotation 174
10.5.4 Calculating Principal Component (PC) Scores 176
10.6 Rules of Thumb For PCA 182
10.7 Summary 186
References 187
11 Nonlinear Manifold Learning 189
11.1 Introduction 189
11.2 Correlation-Based PCA 190
11.3 Kernel PCA 191
11.4 Diffusion Maps 192
11.5 Laplacian Eigenmaps 192
11.6 Local Linear Embedding 193
11.7 Locality Preserving Projections 194
11.8 Sammon Mapping 195
11.9 NLML Prior to Classification Analysis 195
11.10 Classification Results 197
11.11 Summary 200
References 203
Part III Class Prediction 205
12 Feature Selection 207
12.1 Introduction 207
12.2 Filtering versus Wrapping 208
12.3 Data 209
12.3.1 Numbers 209
12.3.2 Responses 209
12.3.3 Measurement Scales 210
12.3.4 Variables 211
12.4 Data Arrangement 211
12.5 Filtering 213
12.5.1 Continuous Features 213
12.5.2 Best Rank Filters 219
12.5.3 Randomization Tests 236
12.5.4 Multitesting Problem 237
12.5.5 Filtering Qualitative Features 242
12.5.6 Multiclass Gini Diversity Index 246
12.5.7 Class Comparison Techniques 247
12.5.8 Generation of Nonredundant Gene List 250
12.6 Selection Methods 254
12.6.1 Greedy Plus Takeaway (Greedy PTA) 254
12.6.2 Best Ranked Genes 258
12.7 Multicollinearity 259
12.8 Summary 270
References 270
13 Classifier Performance 273
13.1 Int