E-Book, Englisch, Band 4426, 1186 Seiten
Zhou / Li / Yang Advances in Knowledge Discovery and Data Mining
2007
ISBN: 978-3-540-71701-0
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
11th Pacific-Asia Conference, PAKDD 2007, Nanjing, China, May 22-25, 2007, Proceedings
E-Book, Englisch, Band 4426, 1186 Seiten
Reihe: Lecture Notes in Computer Science
ISBN: 978-3-540-71701-0
Verlag: Springer Berlin Heidelberg
Format: PDF
Kopierschutz: 1 - PDF Watermark
This book constitutes the refereed proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007, held in Nanjing, China in May 2007.
The 34 revised full papers and 92 revised short papers presented together with four keynote talks or extended abstracts thereof were carefully reviewed and selected from 730 submissions. The papers are devoted to new ideas, original research results and practical development experiences from all KDD-related areas including data mining, machine learning, databases, statistics, data warehousing, data visualization, automatic scientific discovery, knowledge acquisition and knowledge-based systems.
Written for: Researchers and professionals
Keywords: Web data mining, algorithmic learning, ant colony optimization, association rule mining, biomedical data analysis, classification, clustering, computer security, data analysis, data mining, feature selection, image segmentation, information extraction, knowledge discovery, learning classifier systems, machine learning, privacy, qualitative reasoning, random forest, rough sets, statistical learning, support vector machines, text mining, text summarization, workflow mining
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;6
2;Organization;8
3;Table of Contents;16
4;Research Frontiers in Advanced Data Mining Technologies and Applications;27
5;Finding the Real Patterns;32
6;Class Noise vs Attribute Noise: Their Impacts, Detection and Cleansing;33
7;Multi-modal and Multi-granular Learning;35
8;Hierarchical Density-Based Clustering of Categorical Data and a Simplification;37
9;Multi-represented Classification Based on Confidence Estimation;49
10;Selecting a Reduced Set for Building Sparse Support Vector Regression in the Primal;61
11;Mining Frequent Itemsets from Uncertain Data;73
12;QC4 - A Clustering Evaluation Method;85
13;Semantic Feature Selection for Object Discovery in High-Resolution Remote Sensing Imagery;97
14;Deriving Private Information from Arbitrarily Projected Data;110
15;Consistency Based Attribute Reduction;122
16;A Hybrid Command Sequence Model for Anomaly Detection;134
17;s-Algorithm: Structured Workflow ProcessMining Through Amalgamating TemporalWorkcases;145
18;Multiscale BiLinear Recurrent Neural Network for Prediction of MPEG Video Traffic;157
19;An Effective Multi-level Algorithm Based on Ant Colony Optimization for Bisecting Graph;164
20;A Unifying Method for Outlier and Change Detection from Data Streams Based on Local Polynomial Fitting;176
21;Simultaneous Tuning of Hyperparameter and Parameter for Support Vector Machines;188
22;Entropy Regularization, Automatic Model Selection, and Unsupervised Image Segmentation;199
23;A Timing Analysis Model for Ontology Evolutions Based on Distributed Environments;209
24;An Optimum Random Forest Model for Prediction of Genetic Susceptibility to Complex Diseases;219
25;Feature Based Techniques for Auto-Detection of Novel Email Worms;231
26;Multiresolution-Based BiLinear Recurrent Neural Network;243
27;Query Expansion Using a Collection Dependent Probabilistic Latent Semantic Thesaurus;250
28;Scaling Up Semi-supervised Learning: An Efficient and Effective LLGC Variant;262
29;A Machine Learning Approach to Detecting Instantaneous Cognitive States from fMRI Data;274
30;Discovering Correlated Items in Data Streams;286
31;Incremental Clustering in Geography and Optimization Spaces;298
32;Estimation of Class Membership Probabilities in the Document Classification;310
33;A Hybrid Multi-group Privacy-Preserving Approach for Building Decision Trees;322
34;A Constrained Clustering Approach to Duplicate Detection Among Relational Data;334
35;Understanding Research Field Evolving and Trend with Dynamic Bayesian Networks;346
36;Embedding New Data Points for Manifold Learning Via Coordinate Propagation;358
37;Spectral Clustering Based Null Space Linear Discriminant Analysis (SNLDA);370
38;On a New Class of Framelet Kernels for Support Vector Regression and Regularization Networks;381
39;A Clustering Algorithm Based on Mechanics;393
40;DLDA/QR: A Robust Direct LDA Algorithm for Face Recognition and Its Theoretical Foundation;405
41;gPrune: A Constraint Pushing Framework for Graph Pattern Mining;414
42;Modeling Anticipatory Event Transitions;427
43;A Modified Relationship Based Clustering Framework for Density Based Clustering and Outlier Filtering on High Dimensional Datasets;435
44;A Region-Based Skin Color Detection Algorithm;443
45;Supportive Utility of Irrelevant Features in Data Preprocessing;451
46;Incremental Mining of Sequential Patterns Using Prefix Tree;459
47;A Multiple Kernel Support Vector Machine Scheme for Simultaneous Feature Selection and Rule-Based Classification;467
48;Combining Supervised and Semi-supervised Classifier for Personalized Spam Filtering;475
49;Qualitative Simulation and Reasoning with Feature Reduction Based on Boundary Conditional Entropy of Knowledge;483
50;A Hybrid Incremental Clustering Method-Combining Support Vector Machine and Enhanced Clustering by Committee Clustering Algorithm;491
51;CCRM: An Effective Algorithm for Mining Commodity Information from Threaded Chinese Customer Reviews;499
52;A Rough Set Approach to Classifying Web Page Without Negative Examples;507
53;Evolution and Maintenance of Frequent Pattern Space When Transactions Are Removed;515
54;Establishing Semantic Relationship in Inter-query Learning for Content-Based Image Retrieval Systems;524
55;Density-Sensitive Evolutionary Clustering;533
56;Reducing Overfitting in Predicting Intrinsically Unstructured Proteins;541
57;Temporal Relations Extraction in Mining Hepatitis Data;549
58;Supervised Learning Approach to Optimize Ranking Function for Chinese FAQ-Finder;557
59;Combining Convolution Kernels Defined on Heterogeneous Sub-structures;565
60;Privacy-Preserving Sequential Pattern Release;573
61;Mining Concept Associations for Knowledge Discovery Through Concept Chain Queries;581
62;Capability Enhancement of Probabilistic Neural Network for the Design of Breakwater Armor Blocks;589
63;Named Entity Recognition Using Acyclic Weighted Digraphs: A Semi-supervised Statistical Method;597
64;Contrast Set Mining Through Subgroup Discovery Applied to Brain Ischaemina Data;605
65;Intelligent Sequential Mining Via Alignment: Optimization Techniques for Very Large DB;613
66;A Hybrid Prediction Method Combining RBF Neural Network and FAR Model;624
67;An Advanced Fuzzy C-Mean Algorithm for Regional Clustering of Interconnected Systems;632
68;Centroid Neural Network with Bhattacharyya Kernel for GPDF Data Clustering;642
69;Concept Interconnection Based on Many-Valued Context Analysis;649
70;Text Classification for Thai Medicinal Web Pages;657
71;A Fast Algorithm for Finding Correlation Clusters in Noise Data;665
72;Application of Discrimination Degree for Attributes Reduction in Concept Lattice;674
73;A Language and a Visual Interface to Specify Complex Spatial Patterns;682
74;Clustering Ensembles Based on Normalized Edges;690
75;Quantum-Inspired Immune Clonal Multiobjective Optimization Algorithm;698
76;Phase Space Reconstruction Based Classification of Power Disturbances Using Support Vector Machines;706
77;Mining the Impact Factors of Threads and Participators on Usenet Using Link Analysis;714
78;Weighted Rough Set Learning: Towards a Subjective Approach;722
79;Multiple Self-Splitting and Merging Competitive Learning Algorithm;730
80;A Novel Relative Space Based Gene Feature Extraction and Cancer Recognition;738
81;Experiments on Kernel Tree Support Vector Machines for Text Categorization;746
82;A New Approach for Similarity Queries of Biological Sequences in Databases;754
83;Anomaly Intrusion Detection Based on Dynamic Cluster Updating;763
84;Efficiently Mining Closed Constrained Frequent Ordered Subtrees by Using Border Information;771
85;Approximate Trace of Grid-Based Clusters over High Dimensional Data Streams;779
86;BRIM: An Efficient Boundary Points Detecting Algorithm;787
87;Syntactic Impact on Sentence Similarity Measure in Archive-Based QA System;795
88;Semi-structure Mining Method for Text Mining with a Chunk-Based Dependency Structure;803
89;Principal Curves with Feature Continuity;811
90;Kernel-Based Linear Neighborhood Propagation for Semantic Video Annotation;819
91;Learning Bayesian Networks with Combination of MRMR Criterion and EMI Method;827
92;A Cooperative Coevolution Algorithm of RBFNN for Classification;835
93;ANGEL: A New Effective and Efficient Hybrid Clustering Technique for Large Databases;843
94;Exploring Group Moving Pattern for an Energy-Constrained Object Tracking Sensor Network;851
95;ProMail: Using Progressive Email Social Network for Spam Detection;859
96;Multidimensional Decision Support Indicator (mDSI) for Time Series Stock Trend Prediction;867
97;A Novel Support Vector Machine Ensemble Based on Subtractive Clustering Analysis;875
98;Keyword Extraction Based on PageRank;883
99;Finding the Optimal Feature Representations for Bayesian Network Learning;891
100;Feature Extraction and Classification of Tumor Based on Wavelet Package and Support Vector Machines;897
101;Resource Allocation and Scheduling Problem Based on Genetic Algorithm and Ant Colony Optimization;905
102;Image Classification and Segmentation for Densely Packed Aggregates;913
103;Mining Temporal Co-orientation Pattern from Spatio-temporal Databases;921
104;Incremental Learning of Support Vector Machines by Classifier Combining;930
105;Clustering Zebrafish Genes Based on Frequent-Itemsets and Frequency Levels;938
106;A Practical Method for Approximate Subsequence Search in DNA Databases;947
107;An Information Retrieval Model Based on Semantics;958
108;AttributeNets: An Incremental Learning Method for Interpretable Classification;966
109;Mining Personalization Interest and Navigation Patterns on Portal;974
110;Cross-Lingual Document Clustering;982
111;Grammar Guided Genetic Programming forFlexible Neural Trees Optimization;990
112;A New Initialization Method for Clustering Categorical Data;998
113;L0-Constrained Regression for Data Mining;1007
114;Application of Hybrid Pattern Recognition for Discriminating Paddy Seeds of Different Storage Periods Based on Vis/NIRS;1015
115;Density-Based Data Clustering Algorithms for Lower Dimensions Using Space-Filling Curves;1023
116;Transformation-Based GMM with Improved Cluster Algorithm for Speaker Identification;1032
117;Using Social Annotations to Smooth the Language Model for IR;1041
118;Affection Factor Optimization in Data Field Clustering;1048
119;A New Algorithm for Minimum Attribute Reduction Based on Binary Particle Swarm Optimization with Vaccination;1055
120;Graph Nodes Clustering Based on the Commute-Time Kernel;1063
121;Identifying Synchronous and Asynchronous Co-regulations from Time Series Gene Expression Data;1072
122;A Parallel Algorithm for Learning Bayesian Networks;1081
123;Incorporating Prior Domain Knowledge into a Kernel Based Feature Selection Algorithm;1090
124;Geo-spatial Clustering with Non-spatial Attributes and Geographic Non-overlapping Constraint: A Penalized Spatial Distance Measure;1098
125;GBKII: An Imputation Method for Missing Values;1106
126;An Effective Gene Selection Method Based on RelevanceAnalysis and Discernibility Matrix;1114
127;Towards Comprehensive Privacy Protection in Data Clustering;1122
128;A Novel Spatial Clustering with Obstacles Constraints Based on Particle Swarm Optimization and K-Medoids;1131
129;Online Rare Events Detection;1140
130;Structural Learning About Independence Graphs from Multiple Databases;1148
131;An Effective Method For Calculating Natural Adjacency Relation in Spatial Database;1157
132;K-Centers Algorithm for Clustering Mixed Type Data;1166
133;Proposion and Analysis of a TCP Feature of P2P Traffic;1174
134;Author Index;1182
Research Frontiers in Advanced Data Mining Technologies and Applications (p. 25)
Data mining, as the confluence of multiple intertwined disciplines, including statistics, machine learning, pattern recognition, database systems, information retrieval, World-Wide Web, and many application domains, has achieved great progress in the past decade [1]. Similar to many research fields, data mining has two general directions: theoretical foundations and advanced technologies and applications.
Here we focus on advanced technologies and applications in data mining and discuss some recent progress in this direction. Notice that some popular research topics, such as privacypreserving data mining, are not covered in the discussion for lack of space/time. Our discussion is organized into nine themes, and we briefly outline the current status and research problems in each theme.
1 Pattern Mining, Pattern Usage, and Pattern Understanding
Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structural pattern mining, correlation mining, associative classification, and frequent-pattern-based clustering, as well as their broad applications.
Recently, studies have proceeded to scalable methods for mining colossal patterns where the size of the patterns could be rather large so that the step-by-step growth using an Apriori-like approach does not work, methods for pattern compression, extraction of high-quality top-k patterns, and understanding patterns by context analysis and generation of semantic annotations.
Moreover, frequent patterns have been used for effective classification by top-k rule generation for long patterns and discriminative frequent pattern analysis. Frequent patterns have also been used for clustering of high-dimensional biological data. Scalable methods for mining long, approximate, compressed, and sophisticated patterns for advanced applications, such as biological sequences and networks, and the exploration of mined patterns for classification, clustering, correlation analysis, and pattern understanding will still be interesting topics in research.
2 Information Network Analysis
Google’s PageRank algorithm has started a revolution on Internet search. However, since information network analysis covers many additional aspects and needs scalable and effective methods, the systematic study of this domain has just started, with many interesting issues to be explored. Information network analysis has broad applications, covering social and biological network analysis, computer network intrusion detection, software program analysis, terrorist network discovery, and Web analysis.
One interesting direction is to treat information network as graphs and further develop graph mining methods. Recent progress on graph mining and its associated structural pattern-based classification and clustering, graph indexing, and similarity search will play an important role in information network analysis.
Moreover, since information networks often form huge, multidimensional heterogeneous graphs, mining noisy, approximate, and heterogeneous subgraphs based on different applications for the construction of application-specific networks with sophisticated structures will help information network analysis substantially.The discovery of the power law distribution of information networks and the rules on density evolution of information networks will help develop effective algorithms for network analysis.




