Celebi | Partitional Clustering Algorithms | E-Book | www.sack.de
E-Book

E-Book, Englisch, 420 Seiten

Celebi Partitional Clustering Algorithms


2015
ISBN: 978-3-319-09259-1
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark

E-Book, Englisch, 420 Seiten

ISBN: 978-3-319-09259-1
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book focuses on partitional clustering algorithms, which are commonly used in engineering and computer scientific applications. The goal of this volume is to summarize the state-of-the-art in partitional clustering. The book includes such topics as center-based clustering, competitive learning clustering and density-based clustering. Each chapter is contributed by a leading expert in the field.

Dr. Emre Celebi is an Associate Professor with the Department of Computer Science, at Louisiana State University in Shreveport.

Celebi Partitional Clustering Algorithms jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;Preface;6
2;Contents;10
3;1 Recent Developments in Model-Based Clusteringwith Applications;12
3.1;1.1 Introduction;12
3.2;1.2 Methodological Developments;18
3.2.1;1.2.1 Initialization;18
3.2.2;1.2.2 Spurious Solutions;21
3.2.3;1.2.3 Merging Mixture Components for Clustering;23
3.2.4;1.2.4 Nonparametric Clustering;26
3.2.5;1.2.5 Variable Selection for Clustering;28
3.2.6;1.2.6 Semi-Supervised Clustering;30
3.2.7;1.2.7 Diagnostics and Assessment of Partition Variability;32
3.2.8;1.2.8 Miscellaneous Topics;34
3.3;1.3 Modern Applications of Model-Based Clustering;35
3.3.1;1.3.1 Clustering Tree Ring Sequences;35
3.3.2;1.3.2 Identification of Differentially Expressed Genes;36
3.3.3;1.3.3 Analysis of Customer Navigation Patterns;38
3.3.4;1.3.4 Data Clustering on Unit-Hypersphere;39
3.3.5;1.3.5 Analysis of Mass Spectrometry Data;40
3.3.6;1.3.6 Network Clustering;41
3.4;References;43
4;2 Accelerating Lloyd's Algorithm for k-Means Clustering;51
4.1;2.1 Introduction;52
4.1.1;2.1.1 Popularity of the k-Means Algorithm;52
4.1.2;2.1.2 The Standard k-Means Algorithm does a Lot of Unnecessary Work;53
4.1.3;2.1.3 Previous Work on k-Means Acceleration;53
4.1.3.1;2.1.3.1 Algorithmic Improvements;54
4.1.3.2;2.1.3.2 Parallelization;55
4.1.3.3;2.1.3.3 Alternative Heuristic Methods;55
4.2;2.2 Cluster Distortion and Lloyd's Algorithm;55
4.2.1;2.2.1 Analysis of Lloyd's Algorithm;57
4.2.2;2.2.2 MacQueen's Algorithm;58
4.3;2.3 Tree Structured Approaches;58
4.3.1;2.3.1 Blacklisting and Filtering Centers with k-d Trees;58
4.3.2;2.3.2 Anchors Hierarchy;60
4.4;2.4 Triangle Inequality Approaches;60
4.4.1;2.4.1 Using the Triangle Inequality for Center-Center and Center-Point Distances;61
4.4.2;2.4.2 Maintaining Distance Bounds with the Triangle Inequality;62
4.4.3;2.4.3 Elkan's Algorithm: k Lower Bounds, k2 Center-Center Distances;64
4.4.4;2.4.4 Hamerly's Algorithm: 1 Lower Bound;64
4.4.5;2.4.5 Drake's Algorithm: 1 < b < k Lower Bounds;66
4.4.6;2.4.6 Annular Algorithm: Sorting the Centers by Norm;68
4.4.7;2.4.7 Kernelized k-Means with Distance Bounds;70
4.5;2.5 Heap-Ordered k-Means: Inverting the Innermost Loops;72
4.5.1;2.5.1 Reducing the Number of Bounds Kept;72
4.5.2;2.5.2 Cost of Combining Distance Bounds;73
4.5.3;2.5.3 Inverting the Loops Over n and k;73
4.5.4;2.5.4 Heap-Structured Bounds;73
4.5.5;2.5.5 Analysis of Heap-Structured k-Means;75
4.6;2.6 Parallelization;76
4.7;2.7 Experiments and Discussion;77
4.7.1;2.7.1 Testing Platforms;78
4.7.2;2.7.2 Speedup Relative to the Naive Algorithm;78
4.7.3;2.7.3 Parallelism;79
4.7.4;2.7.4 Number of Distance Calculations;81
4.7.5;2.7.5 Memory Use;85
4.8;2.8 Conclusion;86
4.9;References;87
5;3 Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm;89
5.1;3.1 Introduction;90
5.2;3.2 Linear, Deterministic, and Order-Invariant K-Means Initialization Methods;92
5.3;3.3 Experimental Setup;97
5.3.1;3.3.1 Data Set Descriptions;97
5.3.2;3.3.2 Attribute Normalization;97
5.3.3;3.3.3 Performance Criteria;98
5.4;3.4 Experimental Results and Discussion;99
5.5;3.5 Conclusions;103
5.6;References;104
6;4 Nonsmooth Optimization Based Algorithms in Cluster Analysis;109
6.1;4.1 Introduction;109
6.2;4.2 Optimization Formulations of Clustering Problems;112
6.2.1;4.2.1 Combinatorial Formulation of the Clustering Problem;113
6.2.2;4.2.2 Mixed Integer Nonlinear Programming Formulation of the Clustering Problem;113
6.2.3;4.2.3 Nonsmooth Nonconvex Optimization Formulation of the Clustering Problem;114
6.2.4;4.2.4 Comparison of Different Formulations of Clustering Problem;115
6.3;4.3 The Auxiliary Cluster Problem;115
6.4;4.4 An Incremental Clustering Algorithm;116
6.5;4.5 Computation of Starting Points for Cluster Centers;117
6.6;4.6 The Modified Incremental Clustering Algorithm;121
6.7;4.7 Solving Optimization Problems;121
6.7.1;4.7.1 k-Means Type Heuristic Algorithm;122
6.7.2;4.7.2 The Discrete Gradient Method;122
6.7.3;4.7.3 An Algorithm Based on Smoothing Techniques;123
6.7.3.1;4.7.3.1 Hyperbolic Smoothing of the Cluster Function;124
6.7.3.2;4.7.3.2 Hyperbolic Smoothing of the Auxiliary Cluster Function;125
6.7.3.3;4.7.3.3 Hyperbolic Smoothing of L1-Norm;125
6.7.3.4;4.7.3.4 Hyperbolic Smoothing of L?-Norm;127
6.7.3.5;4.7.3.5 Smooth Clustering Problems;128
6.8;4.8 Implementation of Incremental Clustering Algorithm;129
6.9;4.9 Computational Results: Evaluation of the Incremental Algorithm;130
6.9.1;4.9.1 Results for the Similarity Measure Based on the Squared L2-Norm;131
6.9.2;4.9.2 Results for the Similarity Measure Based on the L1-Norm;135
6.9.3;4.9.3 Results for the Similarity Measure Based on the L?-Norm;135
6.9.4;4.9.4 Dependence of Number of Distance Function Evaluations and CPU Time on Number of Clusters;135
6.9.5;4.9.5 Results for Purity in Data Sets with Class Labels;139
6.9.6;4.9.6 Visualization of Results;142
6.10;4.10 Computational Results: Comparison with Other Clustering Algorithms;143
6.10.1;4.10.1 Comparison of Algorithms Using the Similarity Measure Based on the Squared L2-Norm;145
6.10.2;4.10.2 Comparison of Algorithms Using the Similarity Measure Based on the L1-Norm;149
6.10.3;4.10.3 Comparison of Algorithms Using the Similarity Measure Based on the L?-Norm;150
6.11;4.11 Conclusions;153
6.12;References;154
7;5 Fuzzy Clustering Algorithms and Validity Indicesfor Distributed Data;157
7.1;5.1 Introduction;157
7.2;5.2 Fuzzy Clustering Algorithms;161
7.2.1;5.2.1 FCM: Fuzzy c-Means;162
7.2.2;5.2.2 GK: Gustafson Kessel;163
7.2.3;5.2.3 Other Fuzzy Clustering Algorithms;164
7.2.4;5.2.4 Complexity Analysis: Summary;164
7.3;5.3 Clustering Validation;165
7.3.1;5.3.1 XB: Xie-Beni;165
7.3.2;5.3.2 FSS: Fuzzy Simplified Silhouette;166
7.3.3;5.3.3 K: Kwon;166
7.3.4;5.3.4 TSS: Tang-Sun-Sun;167
7.3.5;5.3.5 FS: Fukuyama-Sugeno;167
7.3.6;5.3.6 FHV: Fuzzy Hypervolume;167
7.3.7;5.3.7 APD: Average Partition Density;168
7.3.8;5.3.8 PD: Partition Density;168
7.3.9;5.3.9 SCG;168
7.3.10;5.3.10 PBMF;169
7.3.11;5.3.11 Complexity Analysis: Summary;169
7.3.12;5.3.12 OMR: Ordered Multiple Runs;169
7.4;5.4 Distributed Fuzzy Clustering Algorithms;171
7.4.1;5.4.1 DFCM: Distributed Fuzzy c-Means;171
7.4.2;5.4.2 Framework for Distributed Data;174
7.4.3;5.4.3 Other Distributed Fuzzy Clustering Algorithms;176
7.4.4;5.4.4 Complexity Analysis: Communication;177
7.5;5.5 Distributed Clustering Validation;178
7.5.1;5.5.1 DXB: Distributed Xie-Beni;179
7.5.2;5.5.2 DFSS: Distributed Fuzzy Simplified Silhouette;179
7.5.3;5.5.3 DK: Distributed Kwon;180
7.5.4;5.5.4 DTSS: Distributed Tang-Sun-Sun;180
7.5.5;5.5.5 DFS: Distributed Fukuyama-Sugeno;181
7.5.6;5.5.6 DFHV: Distributed Fuzzy Hypervolume;182
7.5.7;5.5.7 DAPD: Distributed Average Partition Density;182
7.5.8;5.5.8 DPD: Distributed Partition Density;183
7.5.9;5.5.9 DSCG: Distributed SCG;183
7.5.10;5.5.10 DPBMF: Distributed PBMF;183
7.5.11;5.5.11 Complexity Analysis: Communication;184
7.5.12;5.5.12 DOMR: Distributed Ordered Multiple Runs;185
7.6;5.6 Summary of Results;185
7.7;5.7 Experimental Evaluation;186
7.8;5.8 Final Remarks;190
7.9;Appendix: Additional Fuzzy Clustering Algorithms;191
7.9.1;GG: Gath and Geva;191
7.9.2;FCV: Fuzzy c-Varieties;192
7.9.3;FCE: Fuzzy c-Elliptotypes;193
7.9.4;PCM: Possibilistic c-Means;194
7.9.5;PGK: Possibilistic Gustafson-Kessel;196
7.9.6;FPCM: Fuzzy-Possibilistic c-Means;196
7.9.7;PFCM: Possibilistic-Fuzzy c-Means;198
7.10;References;199
8;6 Density Based Clustering: Alternatives to DBSCAN;203
8.1;6.1 Introduction;203
8.2;6.2 Related Work;206
8.2.1;6.2.1 DBSCAN;206
8.3;6.3 Clustering with a Different Notion of Density;210
8.3.1;6.3.1 Black Hole Clustering;210
8.3.2;6.3.2 Protoclustering;212
8.4;6.4 Evaluation;215
8.4.1;6.4.1 Black Hole Clustering Evaluation;217
8.4.2;6.4.2 Protoclustering;219
8.5;6.5 Conclusion;221
8.6;References;222
9;7 Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering;224
9.1;7.1 Introduction to Nonnegative Matrix Factorization;225
9.2;7.2 Nonnegative Matrix Factorization for Clustering;226
9.3;7.3 Optimization Framework for Nonnegative Matrix Factorization;229
9.3.1;7.3.1 Block Coordinate Descent Framework;229
9.3.1.1;7.3.1.1 Convergence Property;231
9.3.1.2;7.3.1.2 Stopping Criterion;232
9.3.2;7.3.2 Extension 1: Sparse NMF;233
9.3.3;7.3.3 Extension 2: Weakly-Supervised NMF;234
9.4;7.4 Choosing the Number of Clusters;235
9.5;7.5 Experimental Results;237
9.5.1;7.5.1 Data Sets and Algorithms;237
9.5.2;7.5.2 Clustering Quality;239
9.5.3;7.5.3 Convergence Behavior;241
9.5.4;7.5.4 Sparseness;241
9.5.5;7.5.5 Consistency from Multiple Runs;242
9.6;7.6 UTOPIAN: User-driven Topic Modeling via Interactive NMF;243
9.6.1;7.6.1 Usage Scenarios;245
9.7;7.7 Conclusions and Future Directions;247
9.8;References;249
10;8 Overview of Overlapping Partitional Clustering Methods;253
10.1;8.1 Introduction;253
10.2;8.2 Classification of Exiting Approaches for Overlapping Clustering;254
10.3;8.3 Overlapping Partitional Clustering Methods;257
10.3.1;8.3.1 Uncertain Memberships Based-Methods;257
10.3.1.1;8.3.1.1 Fuzzy c-Means (FCM) and Possibilistic c-Means (PCM);258
10.3.1.2;8.3.1.2 Evidential c-Means (ECM) and Belief c-Means (BCM);260
10.3.2;8.3.2 Hard Memberships Based-Methods;263
10.3.2.1;8.3.2.1 Additive Methods;263
10.3.2.2;8.3.2.2 Geometrical Methods;265
10.3.3;8.3.3 Summary of Overlapping Partitional Methods;270
10.4;8.4 Evaluation of Overlapping Clustering;272
10.4.1;8.4.1 Label Based Evaluation;272
10.4.2;8.4.2 Pair Based Evaluation;273
10.4.3;8.4.3 BCubed Evaluation;274
10.4.4;8.4.4 Synthesis of Evaluation Methods for Overlapping Clustering;275
10.5;8.5 Empirical Evaluation of Overlapping Partitional Clustering Methods;276
10.6;8.6 Conclusion;281
10.7;References;281
11;9 On Semi-Supervised Clustering;284
11.1;9.1 Introduction;284
11.2;9.2 Overview and Taxonomy of Algorithms;287
11.2.1;9.2.1 Types of Supervision (Side-Knowledge);288
11.2.2;9.2.2 Partitional SSC Algorithms;289
11.2.3;9.2.3 Hierarchical SSC Algorithms;290
11.3;9.3 Main Issues and Key Points;291
11.3.1;9.3.1 Evaluation of Clustering Quality: Internal/External Measures;291
11.3.2;9.3.2 Relevant Questions on Supervision;292
11.3.2.1;9.3.2.1 Equivalence Between Constraint-Based and Label-Based Supervision;292
11.3.2.2;9.3.2.2 Selecting the Best Query in Active Learning;293
11.3.2.3;9.3.2.3 Uncertain Supervision;294
11.4;9.4 Algorithms for SSC;295
11.4.1;9.4.1 COP-COBWEB and COP-kMeans;295
11.4.2;9.4.2 A Probabilistic Framework for SSC: The HMRF k-Means Method;297
11.4.3;9.4.3 Semi-Supervised Learning of the Distance Measure;301
11.4.4;9.4.4 Seeded k-Means and Constrained k-Means;303
11.4.5;9.4.5 The Zheng-Li Algorithm;304
11.4.6;9.4.6 Active Fuzzy Constrained Clustering;307
11.4.7;9.4.7 Semi-Supervised Graph Clustering: A Kernel Approach;310
11.5;9.5 A Glimpse of the Future: Some Research Directions;313
11.6;9.6 Conclusions;315
11.7;References;315
12;10 Consensus of Clusterings Based on High-Order Dissimilarities;319
12.1;10.1 Introduction;319
12.2;10.2 High-Order Dissimilarity: The Dissimilarity Increments Principle;322
12.2.1;10.2.1 Dissimilarity Increments: Definition and Properties;323
12.2.2;10.2.2 Dissimilarity Increments Distribution (DID);323
12.2.2.1;10.2.2.1 DID for High-Dimensional Data;324
12.2.2.2;10.2.2.2 DID for Two-Dimensional Data;325
12.2.2.3;10.2.2.3 Characterization and Properties of 2-DID;326
12.2.3;10.2.3 DID Models and Data Fitting;328
12.2.3.1;10.2.3.1 Best Approximation to DID for High-Dimensional Data;328
12.2.3.2;10.2.3.2 Fitting DID to Non-Gaussian Data;330
12.3;10.3 Partitional Clustering;330
12.3.1;10.3.1 Initial Data Partition;331
12.3.2;10.3.2 Merge Criterion;332
12.4;10.4 Consensus Clustering with DID;333
12.5;10.5 Validation with DID;334
12.6;10.6 Experimental Results and Discussion;336
12.6.1;10.6.1 Partitional Clustering;338
12.6.1.1;10.6.1.1 Known Number of Clusters;338
12.6.1.2;10.6.1.2 Unknown Number of Clusters;339
12.6.2;10.6.2 Consensus Clustering;345
12.7;10.7 Conclusions;350
12.8;Appendix;352
12.9;References;355
13;11 Hubness-Based Clustering of High-Dimensional Data;358
13.1;11.1 Introduction;359
13.2;11.2 High-Dimensional Data Clustering;359
13.3;11.3 The Hubness Phenomenon;360
13.3.1;11.3.1 The Emergence of Hubs;361
13.3.2;11.3.2 Relation of Hubs to Data Clusters;363
13.4;11.4 Effects of Hubness on Clustering;364
13.5;11.5 Interaction Between Kernels and Hubness;368
13.6;11.6 Clustering Algorithms Based on Hubness;371
13.6.1;11.6.1 Deterministic Approach;372
13.6.2;11.6.2 Probabilistic Approach;373
13.6.3;11.6.3 A Hybrid Approach: Extending K-Means;375
13.6.4;11.6.4 Using Kernels in Hubness-Based Clustering;375
13.6.5;11.6.5 Scalability of Hubness-Based Approaches;377
13.7;11.7 Experimental Comparisons;379
13.8;11.8 Application Domains and Other Types of Hubness-Aware Methods;385
13.9;11.9 Possible Future Directions;386
13.10;References;387
14;12 Clustering for Monitoring Distributed Data Streams;392
14.1;12.1 Introduction;393
14.2;12.2 Text Mining Application;395
14.3;12.3 Monitoring Threshold Functions Through Clustering: Motivation;396
14.4;12.4 Monitoring Threshold Functions Through Clustering: Implementation;399
14.5;12.5 Experimental Results;403
14.5.1;12.5.1 Data;403
14.5.2;12.5.2 Monitoring with Incremental Clustering;404
14.6;12.6 Conventional Clustering Algorithms;406
14.6.1;12.6.1 PDDP;407
14.6.2;12.6.2 Batch k-Means;408
14.6.3;12.6.3 Incremental k-Means;410
14.6.4;12.6.4 Batch k-Means Followed by Incremental k-Means;410
14.6.5;12.6.5 Node Clustering with Classical Clustering Algorithms;411
14.7;12.7 Conclusions;412
14.8;Appendix 1: First and Second Moments;413
14.9;Appendix 2: Broadcast Count;416
14.10;References;418



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.