E-Book, Englisch, 719 Seiten
Reihe: Studies in Classification, Data Analysis, and Knowledge Organization
Preisach / Burkhardt / Han Data Analysis, Machine Learning and Applications
1. Auflage 2008
ISBN: 978-3-540-78246-9
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007
E-Book, Englisch, 719 Seiten
Reihe: Studies in Classification, Data Analysis, and Knowledge Organization
ISBN: 978-3-540-78246-9
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
Data analysis and machine learning are research areas at the intersection of computer science, artificial intelligence, mathematics and statistics. They cover general methods and techniques that can be applied to a vast set of applications such as web and text mining, marketing, medical science, bioinformatics and business intelligence. This volume contains the revised versions of selected papers in the field of data analysis, machine learning and applications presented during the 31st Annual Conference of the German Classification Society (Gesellschaft für Klassifikation - GfKl). The conference was held at the Albert-Ludwigs-University in Freiburg, Germany, in March 2007.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Contents;10
3;Part I Classification;19
3.1;Distance-based Kernels for Real-valued Data;20
3.1.1;1 Introduction;20
3.1.2;2 Kernels and similarities defined on real numbers;21
3.1.3;3 Semantics and applicability;22
3.1.4;4 Truncated Euclidean similarity;23
3.1.5;5 Canberra distance-based similarity;24
3.1.6;6 Kernels defined on real vectors;25
3.1.7;7 Conclusions;27
3.1.8;References;27
3.2;Fast Support Vector Machine Classification of Very Large Datasets;28
3.2.1;1 Introduction;28
3.2.2;2 Linear SVM trees;30
3.2.3;3 Non-linear extension;33
3.2.4;4 Experiments;33
3.2.5;5 Conclusion;34
3.2.6;References;35
3.3;Fusion of Multiple Statistical Classifiers;36
3.3.1;1 Introduction;36
3.3.2;2 Classifier fusion;37
3.3.3;3 Diversity of ensemble members;37
3.3.4;4 Combination rules;39
3.3.5;5 Open problems;41
3.3.6;6 Results of experiments;41
3.3.7;7 Conclusions;43
3.3.8;References;43
3.4;Calibrating Margin–based Classifier Scores into Polychotomous Probabilities;46
3.4.1;1 Introduction;46
3.4.2;2 Reduction to binary problems;47
3.4.3;3 Coupling probability estimates;47
3.4.4;4 Dirichlet calibration;48
3.4.5;5 Comparison;50
3.4.6;6 Conclusion;53
3.4.7;References;53
3.5;Classification with Invariant Distance Substitution Kernels;54
3.5.1;1 Introduction;54
3.5.2;2 Background;55
3.5.3;3 Adjustable invariance;57
3.5.4;4 Positive definiteness;58
3.5.5;5 Classification experiments;60
3.5.6;6 Conclusion;61
3.5.7;References;61
3.6;Applying the Kohonen Self-organizing Map Networks to Select Variables;62
3.6.1;1 Introduction;62
3.6.2;2 A proposition to reduce the number of variables;63
3.6.3;3 Applications and results;67
3.6.4;4 Conclusions;69
3.6.5;References;71
3.7;Computer Assisted Classification of Brain Tumors;72
3.7.1;1 Introduction;72
3.7.2;2 Algorithms;73
3.7.3;3 Results;75
3.7.4;4 Conclusions;76
3.7.5;References;76
3.8;Model Selection in Mixture Regression Analysis – A Monte Carlo Simulation Study;78
3.8.1;1 Introduction;78
3.8.2;2 Model selection in mixture models;79
3.8.3;3 Simulation design;80
3.8.4;4 Results summary;81
3.8.5;5 Key contributions and future research directions;83
3.8.6;References;84
3.9;Comparison of Local Classification Methods;86
3.9.1;1 Introduction;86
3.9.2;2 Local classification methods;87
3.9.3;3 Simulation study;89
3.9.4;4 Summary;93
3.9.5;References;93
3.10;Incorporating Domain Specific Information into Gaia Source Classification;94
3.10.1;1 Introduction;94
3.10.2;2 Classification and parametrization;95
3.10.3;3 Classification results;96
3.10.4;4 Summary;99
3.10.5;References;100
3.11;Identification of Noisy Variables for Nonmetric and Symbolic Data in Cluster Analysis;102
3.11.1;1 Introduction;102
3.11.2;2 Characteristics of the HINoV method and its modifications;103
3.11.3;3 Simulation models;104
3.11.4;4 Discussion on the simulation results;105
3.11.5;5 Conclusions;107
3.11.6;References;109
4;Part II Clustering;110
4.1;Families of Dendrograms;112
4.1.1;1 Introduction;112
4.1.2;2 A brief introduction to;113
4.1.3;adic geometry;113
4.1.4;3;115
4.1.5;adic dendrograms;115
4.1.6;4 The space of dendrograms;116
4.1.7;5 Distributions on dendrograms;117
4.1.8;6 Hidden vertices;118
4.1.9;7 Conclusions;118
4.1.10;Acknowledgements;119
4.1.11;References;119
4.2;Mixture Models in Forward Search Methods for Outlier Detection;120
4.2.1;1 Introduction;120
4.2.2;2 The Forward Search;121
4.2.3;3 Forward Search and Normal Mixture Models: the graphical approach;122
4.2.4;4 Forward Search and Normal Mixture Models: the inferential approach;123
4.2.5;5 Concluding remarks and open issues;126
4.2.6;References;127
4.3;On Multiple Imputation Through Finite Gaussian Mixture Models;128
4.3.1;1 Introduction;128
4.3.2;2 Multiple imputation;129
4.3.3;3 Label switching;132
4.3.4;4 Simulation study and results;133
4.3.5;References;134
4.4;Mixture Model Based Group Inference in Fused Genotype and Phenotype Data;136
4.4.1;1 Introduction;136
4.4.2;2 Methods;137
4.4.3;3 Results;140
4.4.4;4 Discussion;142
4.4.5;5 Acknowledgements;142
4.4.6;References;142
4.5;The Noise Component in Model- based Cluster Analysis;144
4.5.1;1 Introduction;144
4.5.2;2 Two variations on the noise component;149
4.5.3;3 Some theory;150
4.5.4;4 The EM-algorithm;152
4.5.5;5 Simulations;152
4.5.6;6 Conclusion;154
4.5.7;References;154
4.6;An Artificial Life Approach for Semi- supervised Learning;156
4.6.1;1 Introduction;156
4.6.2;2 Artificial life;157
4.6.3;3 Semi-supervised artificial life;159
4.6.4;4 Semi-Supervised artificial life for cluster analysis;160
4.6.5;5 Experimental settings and results;160
4.6.6;6 Discussion;161
4.6.7;7 Summary;162
4.6.8;References;163
4.7;Hard and Soft Euclidean Consensus Partitions;164
4.7.1;1 Introduction;164
4.7.2;2 Theory;166
4.7.3;3 Applications;168
4.7.4;References;170
4.8;Rationale Models for Conceptual Modeling;172
4.8.1;1 Subjectivism in the modeling process;172
4.8.2;2 The design rationale approach;173
4.8.3;3 Classification of rationale fragments;175
4.8.4;4 Conclusion;178
4.8.5;References;179
4.9;Measures of Dispersion and Cluster-Trees for Categorical Data;180
4.9.1;1 Motivation;180
4.9.2;2 Measures of dispersion;181
4.9.3;3 Segmentation;185
4.9.4;References;186
4.10;Information Integration of Partially Labeled Data;188
4.10.1;1 Introduction;188
4.10.2;2 Related work;189
4.10.3;3 Four problem classes;189
4.10.4;4 Method;191
4.10.5;5 Evaluation;194
4.10.6;6 Conclusion;195
4.10.7;References;196
5;Part III Multidimensional Data Analysis;198
5.1;Data Mining of an On-line Survey - A Market Research Application;200
5.1.1;1 Introduction;200
5.1.2;2 Data and objectives;200
5.1.3;3 Methodology and results;201
5.1.4;4 Conclusions;207
5.1.5;References;208
5.2;Nonlinear Constrained Principal Component Analysis in the Quality Control Framework;210
5.2.1;1 Introduction;210
5.2.2;2 Constrained principal component analysis;211
5.2.3;3 Nonlinear Constrained Principal Component Analysis;212
5.2.4;4 Stability analysis;214
5.2.5;5 Results and interpretation;214
5.2.6;6 Concluding remarks;216
5.2.7;References;217
5.3;Non Parametric Control Chart by Multivariate Additive Partial Least Squares via Spline;218
5.3.1;1 Introduction;218
5.3.2;2 Multivariate control charts based on projection methods;219
5.3.3;3 Application: monitoring the painting process of hot-rolled aluminium foils;222
5.3.4;4 Conclusion;224
5.3.5;References;224
5.4;Simple Non Symmetrical Correspondence Analysis;226
5.4.1;1 Introduction;226
5.4.2;2 Non symmetrical correspondence analysis;227
5.4.3;3 Simple non symmetrical correspondence analysis;228
5.4.4;4 Father’s and son’s occupations data;230
5.4.5;5 Conclusions;232
5.4.6;References;234
5.5;Factorial Analysis of a Set of Contingency Tables;236
5.5.1;1 Introduction;236
5.5.2;2 Methodology;237
5.5.3;3 Application;240
5.5.4;4 Discussion;242
5.5.5;5 Software notes;243
5.5.6;References;243
6;Part IV Analysis of Complex Data;244
6.1;Graph Mining: Repository vs. Canonical Form;246
6.1.1;1 Introduction;246
6.1.2;2 Canonical form pruning;247
6.1.3;3 Repository of processed subgraphs;248
6.1.4;4 Comparison;250
6.1.5;5 Experiments;251
6.1.6;6 Summary;252
6.1.7;References;253
6.2;Classification and Retrieval of Ancient Watermarks;254
6.2.1;1 Introduction;254
6.2.2;2 Feature extraction;255
6.2.3;3 Results;257
6.2.4;4 Conclusion;261
6.2.5;References;261
6.3;Segmentation and Classification of Hyper- Spectral Skin Data;262
6.3.1;1 Introduction;262
6.3.2;2 Labelling;263
6.3.3;3 Classification;265
6.3.4;4 Results;266
6.3.5;5 Conclusion;268
6.3.6;References;269
6.4;FSMTree: An Efficient Algorithm for Mining Frequent Temporal Patterns;270
6.4.1;1 Introduction;270
6.4.2;2 Foundations and related work;271
6.4.3;3 Algorithms FSMSet and FSMTree;273
6.4.4;4 Performance evaluation and conclusions;276
6.4.5;References;277
6.5;A Matlab Toolbox for Music Information Retrieval;278
6.5.1;1 Motivation and approach;278
6.5.2;2 Feature extraction;279
6.5.3;3 Data analysis;282
6.5.4;4 Application to the study of music and emotion;283
6.5.5;References;284
6.6;A Probabilistic Relational Model for Characterizing Situations in Dynamic Multi- Agent Systems;286
6.6.1;1 Introduction;286
6.6.2;2 Framework for modeling and recognizing situations;287
6.6.3;3 Modeling situations;288
6.6.4;4 Recognizing situations;289
6.6.5;5 Evaluation;291
6.6.6;6 Conclusions and further work;292
6.6.7;References;293
6.7;Applying the Qn Estimator Online;294
6.7.1;1 Introduction;294
6.7.2;2 An update algorithm for the Qn and the HL estimator;295
6.7.3;3 Comparative study;299
6.7.4;4 Conclusions;300
6.7.5;References;301
6.8;A Comparative Study on Polyphonic Musical Time Series Using MCMC Methods;302
6.8.1;1 Introduction;302
6.8.2;2 Polyphonic model;302
6.8.3;3 Extended polyphonic model;304
6.8.4;4 Results;305
6.8.5;5 Conclusion;308
6.8.6;References;308
6.9;Collective Classification for Labeling of Places and Objects in 2D and 3D Range Data;310
6.9.1;1 Introduction;310
6.9.2;2 Related work;311
6.9.3;3 Collective classification;311
6.9.4;4 Feature extraction in 2D maps;313
6.9.5;5 Feature selection;313
6.9.6;6 Experiments;314
6.9.7;7 Conclusions;315
6.9.8;8 Acknowledgment;317
6.9.9;References;317
6.10;Lag or Error? - Detecting the Nature of Spatial Correlation;318
6.10.1;1 Introduction;318
6.10.2;2 Model and test statistics;319
6.10.3;3 Monte Carlo study;322
6.10.4;4 Results;322
6.10.5;References;324
7;Part V Exploratory Data Analysis and Tools for Data Analysis;326
7.1;Urban Data Mining Using Emergent SOM;328
7.1.1;1 Introduction;328
7.1.2;2 Inspection and transformation of data;329
7.1.3;3 Method;330
7.1.4;4 Results;332
7.1.5;5 Conclusion;333
7.1.6;References;335
7.2;The Konstanz Information Miner;336
7.2.1;1 Overview;336
7.2.2;2 Architecture;337
7.2.3;3 Repository;341
7.2.4;4 Extending;342
7.2.5;5 Conclusion;342
7.2.6;References;343
7.3;A Pattern Based Data Mining Approach;344
7.3.1;1 Current situation in data mining;344
7.3.2;2 Introduction to patterns;345
7.3.3;3 Some data mining patterns;347
7.3.4;4 Summary and outlook;349
7.3.5;References;351
7.4;A Framework for Statistical Entity Identification in;352
7.4.1;1 Introduction;352
7.4.2;2 Methodological framework;353
7.4.3;3 Implementation;354
7.4.4;4 Conclusion and future work;358
7.4.5;References;359
7.5;Combining Several SOM Approaches in Data Mining: Application to ADSL Customer Behaviours Analysis;360
7.5.1;1 Introduction;360
7.5.2;2 Network measurements and data description;361
7.5.3;3 Customer segmentation;363
7.5.4;4 Conclusion;370
7.5.5;References;371
7.6;On the Analysis of Irregular Stock Market Trading Behavior;372
7.6.1;1 Introduction;372
7.6.2;2 Irregular trading behavior in a market;373
7.6.3;3 Analysis of trading behavior with complex valued Eigensystem analysis;374
7.6.4;4 Analysis of the dataset;375
7.6.5;5 Conclusion;378
7.6.6;References;379
7.7;A Procedure to Estimate Relations in a Balanced Scorecard;380
7.7.1;1 Related work;380
7.7.2;2 Balanced scorecards;381
7.7.3;3 Model;383
7.7.4;4 Case study;384
7.7.5;5 Results;385
7.7.6;6 Conclusion and outlook;387
7.7.7;References;387
7.8;The Application of Taxonomies in the Context of Configurative Reference Modelling;390
7.8.1;1 Introduction;390
7.8.2;2 Configurative Reference Modelling and the application of taxonomies;391
7.8.3;3 Conclusion;395
7.8.4;4 Outlook;396
7.8.5;References;396
7.9;Two-Dimensional Centrality of a Social Network;398
7.9.1;1 Introduction;398
7.9.2;2 The procedure;399
7.9.3;3 The analysis and the result;399
7.9.4;4 Discussion;401
7.9.5;References;405
7.10;Benchmarking Open-Source Tree Learners in R/RWeka;406
7.10.1;1 Introduction;406
7.10.2;2 Design of the benchmark experiment;407
7.10.3;3 Results of the benchmark experiment;409
7.10.4;4 Discussion and further work;412
7.10.5;References;413
7.11;From Spelling Correction to Text Cleaning – Using Context Information;414
7.11.1;1 Introduction;414
7.11.2;2 Linguistics and context sensitivity;415
7.11.3;3 Framework for text preparation;416
7.11.4;4 Experimental results;419
7.11.5;5 Conclusion and future work;420
7.11.6;References;421
7.12;Root Cause Analysis for Quality Management;422
7.12.1;1 Introduction;422
7.12.2;2 Root Cause Analysis;424
7.12.3;3 Computational results;427
7.12.4;4 Conclusion;428
7.12.5;References;429
7.13;Finding New Technological Ideas and Inventions with Text Mining and Technique Philosophy;430
7.13.1;1 Introduction;430
7.13.2;2 A common structure for raw and context information;431
7.13.3;3 Relevant aspects for the text mining approach from technique philosophy;433
7.13.4;4 A text mining approach for;435
7.13.5;new ideas and inventions;435
7.13.6;5 Evaluation and outlook;436
7.13.7;6 Acknowledge;436
7.13.8;References;437
7.14;Investigating Classifier Learning Behavior with Experiment Databases;438
7.14.1;1 Introduction;438
7.14.2;2 A database for classification experiments;439
7.14.3;3 The experiments;440
7.14.4;4 Using the database;441
7.14.5;5 Conclusions;445
7.14.6;References;445
8;Part VI Marketing and Management Science;446
8.1;Conjoint Analysis for Complex Services Using Clusterwise Hierarchical Bayes Procedures;448
8.1.1;1 Introduction;448
8.1.2;2 Preference measurement for services;449
8.1.3;3 Hierarchical Bayes procedures for conjoint analysis;449
8.1.4;4 Empirical investigation;450
8.1.5;5 Conclusion and outlook;453
8.1.6;References;454
8.2;Building an Association Rules Framework for Target Marketing;456
8.2.1;1 Introduction;456
8.2.2;2 A segment-specific view of cross-category associations;457
8.2.3;3 Methodology;458
8.2.4;4 Empirical application;460
8.2.5;5 Conclusion and future work;463
8.2.6;References;463
8.3;AHP versus ACA – An Empirical Comparison;464
8.3.1;1 Preference measurement for complex products;464
8.3.2;2 The Analytic Hierarchy Process – AHP;465
8.3.3;3 Design of the empirical study;467
8.3.4;4 Results;468
8.3.5;5 Conclusions and outlook;470
8.3.6;References;471
8.4;On the Properties of the Rank Based Multivariate Exponentially Weighted Moving Average Control Charts;472
8.4.1;1 Introduction;472
8.4.2;2 Data depth;472
8.4.3;3 The proposed;473
8.4.4;control chart;473
8.4.5;4 Effect of the reference sample size on;475
8.4.6;control charts;475
8.4.7;performance;475
8.4.8;5 Conclusion;478
8.4.9;Acknowledgements;479
8.4.10;References;479
8.5;Are Critical Incidents Really Critical for a Customer Relationship? A MIMIC Approach;480
8.5.1;1 Introduction;480
8.5.2;2 Hypotheses;481
8.5.3;3 Method;483
8.5.4;4 Results;483
8.5.5;5 Discussion;485
8.5.6;References;486
8.6;Heterogeneity in the Satisfaction-Retention Relationship – A Finite- mixture Approach;488
8.6.1;1 Introduction;488
8.6.2;2 The Model;490
8.6.3;3 Discussion;494
8.6.4;References;494
8.7;An Early-Warning System to Support Activities in the Management of Customer Equity and How to Obtain the Most from Spatial Customer Equity Potentials;496
8.7.1;1 Introduction1;496
8.7.2;2 Strategic customer control dimensions;497
8.7.3;3 Early-warning system;500
8.7.4;4 Empirical example;502
8.7.5;5 Conclusion;503
8.7.6;References;503
8.8;Classifying Contemporary Marketing Practices;506
8.8.1;1 Introduction;506
8.8.2;2 Knowledge on interactive marketing;507
8.8.3;3 A Finite Mixture approach for classifying marketing practices;508
8.8.4;4 Empirical application;510
8.8.5;5 Conclusions;513
8.8.6;References;513
9;Part VII Banking and Finance;514
9.1;Predicting Stock Returns with Bayesian Vector Autoregressive Models;516
9.1.1;1 Introduction;516
9.1.2;2 Literature review;517
9.1.3;3 Model;518
9.1.4;4 Empirical study;519
9.1.5;5 Conclusion and outlook;522
9.1.6;References;523
9.2;The Evaluation of Venture-Backed IPOs – Certification Model versus Adverse Selection Model, Which Does Fit Better?;524
9.2.1;1 Introduction;524
9.2.2;2 The theoretical;525
9.2.3;background: the certification model;525
9.2.4;and the adverse selection model;525
9.2.5;3 Data set and non-parametric hypothesis tests;526
9.2.6;4 Multivariate investigation tools: Partial Least squares regression model;527
9.2.7;5 Conclusion;530
9.2.8;Acknowledgments;530
9.2.9;References;530
9.3;Using Multiple SVM Models for Unbalanced Credit Scoring Data Sets;532
9.3.1;1 Introduction;532
9.3.2;2 SVM models for unbalanced data sets;533
9.3.3;3 Multiple SVM for unbalanced data sets in practice;534
9.3.4;4 Combination of SVM on random input subsets;536
9.3.5;5 Conclusions and outlook;538
9.3.6;References;539
10;Part VIII Business Intelligence;540
10.1;Comparison of Recommender System Algorithms Focusing on the New- item and User- bias Problem;542
10.1.1;1 Introduction;542
10.1.2;2 Related works;543
10.1.3;3 Observed approaches;544
10.1.4;4 Evaluation protocols;546
10.1.5;5 Evaluation and experimental results;546
10.1.6;6 Conclusion;548
10.1.7;References;549
10.2;Collaborative Tag Recommendations;550
10.2.1;1 Introduction;550
10.2.2;2 Related work;551
10.2.3;3 Recommender Systems;552
10.2.4;4 Tag Recommender Systems;553
10.2.5;5 Experimental setup and results;554
10.2.6;6 Conclusions;556
10.2.7;7 Acknowledgments;557
10.2.8;References;557
10.3;Applying Small Sample Test Statistics for Behavior- based Recommendations;558
10.3.1;1 Introduction;558
10.3.2;2 The ideal decision maker: The decision maker without preferences;559
10.3.3;3 Library meta catalogs: An exemplary application area;560
10.3.4;4 Mathematical notation;561
10.3.5;5 POSICI: Probability Of Single Item Co-Inspections;561
10.3.6;6 POMICI: Probability Of Multiple Items Co-Inspections;562
10.3.7;7 POSICI vs. POMICI;564
10.3.8;8 Conclusions and further research;564
10.3.9;References;565
11;Part IX Text Mining, Web Mining, and the Semantic Web;568
11.1;Classifying Number Expressions in German Corpora;570
11.1.1;1 Introduction;570
11.1.2;2 Classification of number expressions;571
11.1.3;3 Experimental evaluation;574
11.1.4;4 Conclusions and future work;576
11.1.5;References;577
11.2;Non-Profit Web Portals - Usage Based Benchmarking for Success Evaluation;578
11.2.1;1 Introduction;578
11.2.2;2 Related work;579
11.2.3;3 Method;580
11.2.4;4 Case study;583
11.2.5;5 Conclusions;584
11.2.6;References;585
11.3;Text Mining of Supreme Administrative Court Jurisdictions;586
11.3.1;1 Introduction;586
11.3.2;2 Administrative Supreme Court jurisdictions;587
11.3.3;3 Investigations;587
11.3.4;4 Conclusion;592
11.3.5;References;593
11.4;Supporting Web-based Address Extraction with Unsupervised Tagging;594
11.4.1;1 Introduction;594
11.4.2;2 Data preparation;596
11.4.3;3 Unsupervised tagging;596
11.4.4;4 Experiments and evaluation;597
11.4.5;5 Conclusion and further work;600
11.4.6;References;600
11.5;A Two-Stage Approach for Context-Dependent Hypernym Extraction;602
11.5.1;1 Introduction;602
11.5.2;2 Document clustering;603
11.5.3;3 Hypernym extraction;604
11.5.4;4 Evaluation;606
11.5.5;5 Conclusion and future work;609
11.5.6;References;609
11.6;Analysis of Dwell Times in Web Usage Mining;610
11.6.1;1 Introduction;610
11.6.2;2 Model specification and estimation;611
11.6.3;3 Real life example;614
11.6.4;4 Conclusion;615
11.6.5;References;617
11.7;New Issues in Near-duplicate Detection;618
11.7.1;1 Introduction;618
11.7.2;2 Fingerprint construction;620
11.7.3;3 Wikipedia as evaluation corpus;623
11.7.4;4 Summary;625
11.7.5;References;625
11.8;Comparing the University of South Florida Homograph Norms with Empirical Corpus Data;628
11.8.1;1 Introduction;628
11.8.2;2 Resources;629
11.8.3;3 Approach;630
11.8.4;4 Results and discussion;632
11.8.5;5 Conclusions and future work;634
11.8.6;Acknowledgments;635
11.8.7;References;635
11.9;Content-based Dimensionality Reduction for Recommender Systems;636
11.9.1;1 Introduction;636
11.9.2;2 Related work;637
11.9.3;3 The proposed approach;637
11.9.4;4 Performance study;641
11.9.5;5 Conclusions;643
11.9.6;References;643
12;Part X Linguistics;644
12.1;The Distribution of Data in Word Lists and its Impact on the Subgrouping of Languages;646
12.1.1;1 General situation;646
12.1.2;2 Special situation;647
12.1.3;3 The bias;649
12.1.4;4 Solution and operationalization;651
12.1.5;5 Discussion;651
12.1.6;6 Conclusions;652
12.1.7;References;652
12.2;Quantitative Text Analysis Using L-, F- and T- Segments;654
12.2.1;1 Introduction;654
12.2.2;2 Data;655
12.2.3;3 Distribution of segment types;656
12.2.4;4 Length distribution of L-segments;657
12.2.5;5 TTR studies;659
12.2.6;6 Conclusion;661
12.2.7;References;661
12.3;Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering;664
12.3.1;1 Introduction;664
12.3.2;2 Background and motivation;665
12.3.3;3 Bootstrapping clustering;667
12.3.4;4 Clustering with noise;667
12.3.5;5 Projecting to geography;668
12.3.6;6 Results;669
12.3.7;7 Discussion;669
12.3.8;Acknowledgments;670
12.3.9;References;670
12.4;Structural Differentiae of Text Types – A Quantitative Model;672
12.4.1;1 Introduction;672
12.4.2;2 Category selection;673
12.4.3;3 The evaluation procedure;674
12.4.4;4 Exploring the structural homogeneity of text types by means of the Iterative Categorisation Procedure ( ICP);675
12.4.5;5 Results;676
12.4.6;6 Discussion;676
12.4.7;7 Conclusion;678
12.4.8;References;679
13;Part XI Data Analysis in Humanities;680
13.1;Scenario Evaluation Using Two-mode Clustering Approaches in Higher Education;682
13.1.1;1 Introduction: Scenario analysis;682
13.1.2;2 Two-Mode clustering (for scenario evaluation);683
13.1.3;3 Example: Scenario evaluation in higher education;685
13.1.4;4 Conclusions;688
13.1.5;References;688
13.2;Visualization and Clustering of Tagged Music Data;690
13.2.1;1 Introduction;690
13.2.2;2 Related work;691
13.2.3;3 Emergent Self Organizing Maps;691
13.2.4;4 Data;692
13.2.5;5 Experimental results;694
13.2.6;6 Conclusion and future work;696
13.2.7;References;696
13.3;Effects of Data Transformation on Cluster Analysis of Archaeometric Data;698
13.3.1;1 Introduction;698
13.3.2;2 Data transformation in archaeometry;699
13.3.3;3 Transformation into ranks;700
13.3.4;4 Distances and cluster analysis;701
13.3.5;5 Romano-British vessel glass classified;702
13.3.6;6 Roman bricks and tiles classified;703
13.3.7;7 Summary;704
13.3.8;References;704
13.4;Fuzzy PLS Path Modeling: A New Tool For Handling Sensory Data;706
13.4.1;1 Introduction;706
13.4.2;2 Fuzzy PLS path modeling;707
13.4.3;3 Application;710
13.4.4;4 Conclusion;712
13.4.5;References;713
13.5;Automatic Analysis of Dewey Decimal Classification Notations;714
13.5.1;1 Introduction;714
13.5.2;2 DDC notations;715
13.5.3;3 Automatic analysis of DDC notations;716
13.5.4;4 Results;719
13.5.5;5 Conclusion;720
13.5.6;References;721
13.6;A New Interval Data Distance Based on the Wasserstein Metric;722
13.6.1;1 Introduction;722
13.6.2;2 A brief survey of the existing distances;723
13.6.3;3 Our proposal: Wasserstein distance;724
13.6.4;4 Dynamic clustering algorithm using different criterion functions;726
13.6.5;5 Conclusion and perspectives;727
13.6.6;References;728
13.6.7;Keywords;730
13.6.8;Author Index;734




