E-Book, Englisch, Band 41, 320 Seiten
Reihe: Studies in Big Data
Sayed-Mouchaweh Learning from Data Streams in Evolving Environments
1. Auflage 2018
ISBN: 978-3-319-89803-2
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark
Methods and Applications
E-Book, Englisch, Band 41, 320 Seiten
Reihe: Studies in Big Data
ISBN: 978-3-319-89803-2
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark
This edited book covers recent advances of techniques, methods and tools treating the problem of learning from data streams generated by evolving non-stationary processes. The goal is to discuss and overview the advanced techniques, methods and tools that are dedicated to manage, exploit and interpret data streams in non-stationary environments. The book includes the required notions, definitions, and background to understand the problem of learning from data streams in non-stationary environments and synthesizes the state-of-the-art in the domain, discussing advanced aspects and concepts and presenting open problems and future challenges in this field. Provides multiple examples to facilitate the understanding data streams in non-stationary environments;
Presents several application cases to show how the methods solve different real world problems;
Discusses the links between methods to help stimulate new research and application directions.
Moamar Sayed-Mouchaweh received his PhD from the University of Reims-France. He was working as Associated Professor in Computer Science, Control and Signal processing at the University of Reims-France in the Research centre in Sciences and Technology of the Information and the Communication. In December 2008, he obtained the Habilitation to Direct Research (HDR) in Computer science, Control and Signal processing. Since September 2011, he is working as a Full Professor in the High National Engineering School of Mines Telecom Lille Douai (France), Department of Computer Science and Automatic Control. He edited and wrote several Springer books and served as a guest editor of several special issues of international journals. He also served as IPC Chair and conference Chair of several international workshops and conferences. He is serving as a member of the Editorial Board of several international Journals.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Contents;8
3;Introduction;10
3.1;1 Learning from Data Streams;10
3.2;2 General Classification of Methods to Learn from Data Streams;12
3.3;3 Contents of This Book;13
3.3.1;3.1 Chapter 2;13
3.3.2;3.2 Chapter 3;14
3.3.3;3.3 Chapter 4;14
3.3.4;3.4 Chapter 5;15
3.3.5;3.5 Chapter 6;16
3.3.6;3.6 Chapter 7;16
3.3.7;3.7 Chapter 8;17
3.3.8;3.8 Chapter 9;17
3.3.9;3.9 Chapter 10;18
3.3.10;3.10 Chapter 11;19
3.3.11;3.11 Chapter 12;19
3.3.12;3.12 Chapter 13;20
3.4;References;20
4;Transfer Learning in Non-stationary Environments;22
4.1;1 Introduction;22
4.2;2 Transfer Learning (TL);25
4.2.1;2.1 Transductive TL;26
4.2.2;2.2 Inductive TL;27
4.3;3 Learning in Non-stationary Environments (NSE);29
4.3.1;3.1 Chunk-by-Chunk Approaches;31
4.3.2;3.2 Example-by-Example Approaches;32
4.4;4 The Relationship Between TL and Learning in NSE;34
4.4.1;4.1 Similarities;34
4.4.2;4.2 Differences;35
4.5;5 The Potential of Transfer Learning in NSE;37
4.5.1;5.1 Dynamic Cross-company Mapped Model Learning (Dycom);39
4.5.2;5.2 Diversity for Dealing with Drifts (DDD);41
4.6;6 Conclusions;43
4.7;References;44
5;A New Combination of Diversity Techniques in Ensemble Classifiers for Handling Complex Concept Drift;47
5.1;1 Introduction;48
5.2;2 Complex Concept Drift Characteristics and Challenges;49
5.2.1;2.1 Speed;49
5.2.2;2.2 Severity;50
5.2.3;2.3 Complex Concept Drift;51
5.3;3 Related Work;51
5.3.1;3.1 Block-Based Technique;52
5.3.2;3.2 Weighting-Data Technique;53
5.3.3;3.3 Filtering-Data Technique;53
5.4;4 The Proposed Approach;54
5.4.1;4.1 Drift Monitoring Process in EnsembleEDIST2;54
5.4.2;4.2 EnsembleEDIST2's Diversity by Variable-Sized Block Technique;56
5.4.3;4.3 EnsembleEDIST2's Diversity by New Filtering-Data Criterion;57
5.4.4;4.4 EnsembleEDIST2's Diversity by New Weighting-Data Process;58
5.5;5 Experimental Evaluation;61
5.5.1;5.1 Synthetic Datasets;61
5.5.2;5.2 Real Datasets;63
5.5.3;5.3 Evaluation Criteria;64
5.5.3.1;5.3.1 Parameter Settings;64
5.6;6 Comparative Study and Interpretation;64
5.6.1;6.1 Impact of N0 on EnsembleEDIST2 Performance;64
5.6.2;6.2 Impact of Ensemble Size on EnsembleEDIST2 Performance;65
5.6.3;6.3 Accuracy of EnsembleEDIST2 Vs Other Ensembles;65
5.7;7 Conclusion;67
5.8;References;68
6;Analyzing and Clustering Pareto-Optimal Objects in Data Streams;70
6.1;1 Introduction;70
6.2;2 Related Work;72
6.3;3 Background;74
6.3.1;3.1 Preference Constructors;74
6.3.2;3.2 PreferenceSQL;75
6.4;4 Preference-Based Stream Processing;76
6.4.1;4.1 The Preference-Based Stream Processing Framework;76
6.4.2;4.2 The Preference Continuous Query Language (PCQL);77
6.4.3;4.3 The Stream-Based Lattice Skyline Algorithm (SLS);79
6.4.3.1;4.3.1 Finding the BMO-Set of a Data Stream;79
6.4.3.2;4.3.2 The SLS Algorithm;80
6.5;5 Clustering of Pareto-Optimal Objects;82
6.5.1;5.1 Clustering Background;82
6.5.2;5.2 The Borda Social Choice Voting Rule for Clustering;83
6.5.2.1;5.2.1 The Borda Social Choice Voting Rule;84
6.5.2.2;5.2.2 Cluster Allocation;84
6.5.2.3;5.2.3 Complexity and Convergence;86
6.6;6 Application Use Case;87
6.7;7 Experiments;88
6.7.1;7.1 Benchmarks for Stream Lattice Skyline Algorithm;89
6.7.1.1;7.1.1 Experiments on Artifical Data;89
6.7.1.2;7.1.2 Experiments on Real World Data;91
6.7.2;7.2 Benchmarks for Borda Social Choice Clustering;92
6.7.2.1;7.2.1 Runtime;92
6.7.2.2;7.2.2 Iterations;93
6.8;8 Conclusion;95
6.9;References;95
7;Error-Bounded Approximation of Data Stream:Methods and Theories;99
7.1;1 Introduction;100
7.2;2 Preliminary;102
7.3;3 OptimalPLR: An Optimal Algorithm to Generate Error-Bounded PLR;104
7.3.1;3.1 Extreme Slopes of Maximal ?-Representative;105
7.3.1.1;3.1.1 Slope Rotation and Extreme Slopes;105
7.3.1.2;3.1.2 Slope Evolution and Reduction;108
7.3.2;3.2 Optimization Strategies;110
7.3.2.1;3.2.1 Computing Extreme Slopes;111
7.3.2.2;3.2.2 Updating Convex Hulls;112
7.3.3;3.3 Error-Bounded PLR Algorithm;112
7.3.3.1;3.3.1 Description of OptimalPLR;112
7.3.3.2;3.3.2 Complexity Analysis;114
7.3.3.3;3.3.3 Discussions of OptimalPLR;115
7.4;4 ParaOptimal: An Optimal Algorithm in Transformed Space;117
7.4.1;4.1 Description of ParaOptimal;117
7.4.1.1;4.1.1 Theoretical Preparation;117
7.4.1.2;4.1.2 Initialization;119
7.4.1.3;4.1.3 Feasible Region Update;119
7.4.2;4.2 Generalization of ParaOptimal;121
7.5;5 Theoretical Analysis of the Equivalence;122
7.5.1;5.1 Mapping of Two Spaces;122
7.5.2;5.2 Equivalence Discussion;123
7.6;6 Summary;125
7.7;References;126
8;Ensemble Dynamics in Non-stationary Data Stream Classification;129
8.1;1 Introduction;130
8.2;2 Ensemble Dynamics;132
8.2.1;2.1 Addition;133
8.2.1.1;2.1.1 Fixed Time of Addition;133
8.2.1.2;2.1.2 Dynamic Time of Addition;133
8.2.2;2.2 Removal;134
8.2.3;2.3 Update;135
8.2.4;2.4 Ensemble Dynamics Taxonomy;136
8.3;3 Formalisation;136
8.4;4 Experimental Study;141
8.4.1;4.1 Data Sets;142
8.4.1.1;4.1.1 Hyperplane Generator;142
8.4.1.2;4.1.2 SEA Data Stream Generator;143
8.4.1.3;4.1.3 Forest Cover-Type Data Set;143
8.4.1.4;4.1.4 Electricity Data Set;143
8.4.2;4.2 Results and Analysis;144
8.5;5 Discussion;147
8.6;6 Summary;157
8.7;References;158
9;Processing Evolving Social Networks for Change Detection Based on Centrality Measures;160
9.1;1 Introduction;160
9.2;2 User Preference Dynamics;161
9.2.1;2.1 User Preferences;162
9.2.2;2.2 Preference Changes in Evolving Environments;162
9.3;3 Preference Change Detection;163
9.3.1;3.1 Processing Streaming Network;163
9.3.2;3.2 Computing Centralities;164
9.3.2.1;3.2.1 Degree Centrality;164
9.3.2.2;3.2.2 Betweenness Centrality;164
9.3.2.3;3.2.3 Closeness Centrality;165
9.3.3;3.3 Moving Window Average (MWA);165
9.3.4;3.4 Weighted Moving Window Average (WMWA);166
9.3.5;3.5 Page–Hinckley Test (PH);166
9.3.6;3.6 Change Point Scoring Function;167
9.3.7;3.7 Change Point Detection;167
9.3.8;3.8 Assumptions;168
9.3.9;3.9 Evaluation;168
9.4;4 Algorithms;168
9.5;5 Methodology;171
9.5.1;5.1 Dataset and Evolving Networks;171
9.5.1.1;5.1.1 Homogeneous Network;171
9.5.1.2;5.1.2 Bipartite Network;171
9.5.2;5.2 User Preference Change Events;173
9.6;6 Experiments;174
9.6.1;6.1 Experimental Environment;174
9.6.2;6.2 Detecting u1 Change-Points;175
9.6.3;6.3 Performance of Proposed Methods;176
9.6.4;6.4 Impact of Parameters;176
9.7;7 Related Work;177
9.8;8 Conclusion;179
9.9;References;180
10;Large-Scale Learning from Data Streams with Apache SAMOA;182
10.1;1 Introduction;182
10.2;2 Description;184
10.3;3 High Level Architecture;185
10.4;4 System Design;186
10.5;5 Machine Learning Algorithms;187
10.6;6 Vertical Hoeffding Tree;188
10.6.1;6.1 Vertical Parallelism;189
10.6.2;6.2 Algorithm Structure;190
10.6.3;6.3 Evaluation;193
10.6.3.1;6.3.1 Accuracy and Time of VHT Local vs. MOA;195
10.6.3.2;6.3.2 Accuracy of VHT Local vs. Distributed;195
10.6.4;6.4 Summary;201
10.7;7 Distributed AMRules;201
10.7.1;7.1 Vertical Parallelism;203
10.7.2;7.2 Horizontal Parallelism;204
10.7.3;7.3 Evaluation;205
10.8;8 Conclusions;211
10.9;References;211
11;Process Mining for Analyzing Customer Relationship Management Systems: A Case Study;213
11.1;1 Introduction;213
11.2;2 Related Work;215
11.3;3 INE Case Study;216
11.3.1;3.1 What Is INE?;216
11.3.2;3.2 Data and Pre-processing;216
11.3.3;3.3 Questions;216
11.3.4;3.4 Process Discovery;217
11.3.5;3.5 Conformance Checking;220
11.3.6;3.6 Performance Analysis;221
11.3.7;3.7 Building Social Network;222
11.3.8;3.8 Conclusions and Future Study;224
11.4;References;224
12;Detecting Smooth Cluster Changes in Evolving Graph Structures;226
12.1;1 Introduction;227
12.2;2 Clustering a Graph Sequence;228
12.2.1;2.1 Problem Definition;228
12.2.2;2.2 Preserving Cluster Membership;230
12.2.3;2.3 Drawbacks of PCM;232
12.3;3 Detecting Smooth Cluster Changes in a Graph Sequence;233
12.3.1;3.1 Clustering a Graph Sequence Using Smoothness Between Two Successive Graphs;233
12.3.2;3.2 Clustering Using the Forgetting Rate;236
12.3.3;3.3 Connectivities of Graphs;237
12.4;4 Experimental Evaluation;239
12.4.1;4.1 Experimental Setup;239
12.4.2;4.2 Results;240
12.4.2.1;4.2.1 Dependence on the Initial Graph of the Graph Sequence;240
12.4.2.2;4.2.2 Varying Cluster Numbers;242
12.4.2.3;4.2.3 Varying Numbers of Vertices;244
12.4.2.4;4.2.4 Graph Connectivities;245
12.4.2.5;4.2.5 Real-World Data;247
12.5;5 Conclusion;248
12.6;References;248
13;Efficient Estimation of Dynamic Density Functions with Applications in Data Streams;250
13.1;1 Introduction;251
13.2;2 Related Work;253
13.2.1;2.1 Dynamic Density;253
13.2.2;2.2 Change Detection;254
13.3;3 KDE-Track: Dynamic Density Estimation;255
13.3.1;3.1 Theoretical Bases of Density Estimation;255
13.3.2;3.2 KDE-Track Method;258
13.3.3;3.3 KDE-Track Implementation;261
13.4;4 Density Estimation Performance Evaluation;266
13.4.1;4.1 Estimation Accuracy on Synthetic Data;266
13.4.1.1;4.1.1 Datasets;266
13.4.2;4.2 Computational Time Cost and Space Usage;269
13.5;5 Applications;271
13.5.1;5.1 Visualizing the Taxi Traffic Data;271
13.5.2;5.2 Online Change Detection;272
13.6;6 Summary and Future Work;279
13.7;References;280
14;Incremental SVM Learning: Review;282
14.1;1 Introduction;282
14.2;2 SVM for Classification;283
14.3;3 Incremental SVM Learning;285
14.3.1;3.1 Online Incremental SVM Learning Methods;286
14.3.2;3.2 Semi Online Incremental SVM Learning Methods;289
14.4;4 Discussion and Comparison;293
14.5;5 Applications of Incremental SVM Learning;294
14.6;6 Conclusion;296
14.7;References;296
15;On Social Network-Based Algorithms for Data Stream Clustering;300
15.1;1 Introduction;300
15.2;2 Data Stream Clustering;301
15.3;3 Related Work;302
15.3.1;3.1 CluStream;303
15.3.2;3.2 ClusTree;303
15.3.3;3.3 DenStream;303
15.3.4;3.4 HAStream;304
15.4;4 Social Network-Based Approaches;304
15.4.1;4.1 Background on Social Networks Theory;305
15.4.2;4.2 CNDenStream;306
15.4.3;4.3 SNCStream;309
15.4.4;4.4 SNCStream+;310
15.5;5 Evaluation;312
15.5.1;5.1 Evaluation Procedure;312
15.5.2;5.2 Parametrization;313
15.5.3;5.3 Synthetic Data;314
15.5.4;5.4 Real-World Datasets;314
15.5.5;5.5 Results;315
15.6;6 Conclusion;317
15.7;References;319




