Buch, Englisch, 376 Seiten, Format (B × H): 178 mm x 251 mm, Gewicht: 839 g
Buch, Englisch, 376 Seiten, Format (B × H): 178 mm x 251 mm, Gewicht: 839 g
ISBN: 978-1-119-37697-2
Verlag: Wiley
A timely overview of cutting edge technologies for multimedia retrieval with a special emphasis on scalability
The amount of multimedia data available every day is enormous and is growing at an exponential rate, creating a great need for new and more efficient approaches for large scale multimedia search. This book addresses that need, covering the area of multimedia retrieval and placing a special emphasis on scalability. It reports the recent works in large scale multimedia search, including research methods and applications, and is structured so that readers with basic knowledge can grasp the core message while still allowing experts and specialists to drill further down into the analytical sections.
Big Data Analytics for Large-Scale Multimedia Search covers: representation learning, concept and event-based video search in large collections; big data multimedia mining, large scale video understanding, big multimedia data fusion, large-scale social multimedia analysis, privacy and audiovisual content, data storage and management for big multimedia, large scale multimedia search, multimedia tagging using deep learning, interactive interfaces for big multimedia and medical decision support applications using large multimodal data.
- Addresses the area of multimedia retrieval and pays close attention to the issue of scalability
- Presents problem driven techniques with solutions that are demonstrated through realistic case studies and user scenarios
- Includes tables, illustrations, and figures
- Offers a Wiley-hosted BCS that features links to open source algorithms, data sets and tools
Big Data Analytics for Large-Scale Multimedia Search is an excellent book for academics, industrial researchers, and developers interested in big multimedia data search retrieval. It will also appeal to consultants in computer science problems and professionals in the multimedia industry.
Autoren/Hrsg.
Weitere Infos & Material
Introduction xv
List of Contributors xix
About the Companion Website xxiii
Part I Feature Extraction from Big Multimedia Data 1
1 Representation Learning on Large and Small Data 3
Chun-Nan Chou, Chuen-Kai Shie, Fu-Chieh Chang, Jocelyn Chang and Edward Y. Chang
1.1 Introduction 3
1.2 Representative Deep CNNs 5
1.2.1 AlexNet 6
1.2.1.1 ReLU Nonlinearity 6
1.2.1.2 Data Augmentation 7
1.2.1.3 Dropout 8
1.2.2 Network in Network 8
1.2.2.1 MLP Convolutional Layer 9
1.2.2.2 Global Average Pooling 9
1.2.3 VGG 10
1.2.3.1 Very Small Convolutional Filters 10
1.2.3.2 Multi-scale Training 11
1.2.4 GoogLeNet 11
1.2.4.1 Inception Modules 11
1.2.4.2 Dimension Reduction 12
1.2.5 ResNet 13
1.2.5.1 Residual Learning 13
1.2.5.2 Identity Mapping by Shortcuts 14
1.2.6 Observations and Remarks 15
1.3 Transfer Representation Learning 15
1.3.1 Method Specifications 17
1.3.2 Experimental Results and Discussion 18
1.3.2.1 Results of Transfer Representation Learning for OM 19
1.3.2.2 Results of Transfer Representation Learning for Melanoma 20
1.3.2.3 Qualitative Evaluation: Visualization 21
1.3.3 Observations and Remarks 23
1.4 Conclusions 24
References 25
2 Concept-Based and Event-Based Video Search in Large Video Collections 31
Foteini Markatopoulou, Damianos Galanopoulos, Christos Tzelepis, Vasileios Mezaris and Ioannis Patras
2.1 Introduction 32
2.2 Video preprocessing and Machine Learning Essentials 33
2.2.1 Video Representation 33
2.2.2 Dimensionality Reduction 34
2.3 Methodology for Concept Detection and Concept-Based Video Search 35
2.3.1 Related Work 35
2.3.2 Cascades for Combining Different Video Representations 37
2.3.2.1 Problem Definition and Search Space 37
2.3.2.2 Problem Solution 38
2.3.3 Multi-Task Learning for Concept Detection and Concept-Based Video Search 40
2.3.4 Exploiting Label Relations 41
2.3.5 Experimental Study 42
2.3.5.1 Dataset and Experimental Setup 42
2.3.5.2 Experimental Results 43
2.3.5.3 Computational Complexity 47
2.4 Methods for Event Detection and Event-Based Video Search 48
2.4.1 Related Work 48
2.4.2 Learning from Positive Examples 49
2.4.3 Learning Solely from Textual Descriptors: Zero-Example Learning 50
2.4.4 Experimental Study 52
2.4.4.1 Dataset and Experimental Setup 52
2.4.4.2 Experimental Results: Learning from Positive Examples 53
2.4.4.3 Experimental Results: Zero-Example Learning 53
2.5 Conclusions 54
2.6 Acknowledgments 55
References 55
3 Big Data Multimedia Mining: Feature Extraction Facing Volume, Velocity, and Variety 61
Vedhas Pandit, Shahin Amiriparian, Maximilian Schmitt, Amr Mousa and Björn Schuller
3.1 Introduction 61
3.2 Scalability through Parallelization 64
3.2.1 Process Parallelization 64
3.2.2 Data Parallelization 64
3.3 Scalability through Feature Engineering 65
3.3.1 Feature Reduction through Spatial Transformations 66
3.3.2 Laplacian Matrix Representation 66
3.3.3 Parallel latent Dirichlet allocation and bag of words 68
3.4 Deep Learning-Based Feature Learning 68
3.4.1 Adaptability that Conquers both Volume and Velocity 70
3.4.2 Convolutional Neural Networks 72
3.4.3 Recurrent Neural Networks 73
3.4.4 Modular Approach to Scalability 74
3.5 Benchmark Studies 76
3.5.1 Dataset 76
3.5.2 Spectrogram Creation 77
3.5.3 CNN-Based Feature Extraction 77
3.5.4 Structure of the CNNs 78
3.5.5 Process Parallelization 79
3.5.6 Results 80
3.6 Closing Remarks 81
3.7 Acknowledgements 82
References 82
Part II Learning Algorithms for Large-Scale Multimedia 89
4 Large-Scale Video Understanding with Limited Training Labels 91
Jingkuan Song, Xu Zhao, Lianli Gao and Liangliang Cao
4.1 Introduction 91
4.2 Video Retrieval with Hashing 91
4.2.1 Overview 91
4.2.2 Unsupervised Multiple Feature Hashing 93
4.2.2.1 Framework 93
4.2.2.2 The Objective Function of MFH 93
4.2.2.3 Solution of MFH 95
4.2.2.3.1 Complexity Analysis 96
4.2.3 Submodular Video Hashing 97
4.2.3.1 Framework 97
4.2.3.2 Video Pooling 97
4.2.3.3 Submodular Video Hashing 98
4.2.4 Experiments 99
4.2.4.1 Experiment Settings 99
4.2.4.1.1 Video Datasets 99
4.2.4.1.2 Visual Features 99
4.2.4.1.3 Algorithms for Comparison 100
4.2.4.2 Results 100
4.2.4.2.1 CC_WEB_VIDEO 100
4.2.4.2.2 Combined Dataset 100
4.2.4.3 Evaluation of SVH 101
4.2.4.3.1 Results 102
4.3 Graph-Based Model for Video Understanding 103
4.3.1 Overview 103
4.3.2 Optimized Graph Learning for Video Annotation 104
4.3.2.1 Framework 104
4.3.2.2 OGL 104
4.3.2.2.1 Terms and Notations 104
4.3.2.2.2 Optimal Graph-Based SSL 105
4.3.2.2.3 Iterative Optimization 106
4.3.3 Context Association Model for Action Recognition 107
4.3.3.1 Context Memory 108
4.3.4 Graph-based Event Video Summarization 109
4.3.4.1 Framework 109
4.3.4.2 Temporal Alignment 110
4.3.5 TGIF: A New Dataset and Benchmark on Animated GIF Description 111
4.3.5.1 Data Collection 111
4.3.5.2 Data Annotation 112
4.3.6 Experiments 114
4.3.6.1 Experimental Settings 114
4.3.6.1.1 Datasets 114
4.3.6.1.2 Features 114
4.3.6.1.3 Baseline Methods and Evaluation Metrics 114
4.3.6.2 Results 115
4.4 Conclusions and Future Work 116
References 116
5 Multimodal Fusion of Big Multimedia Data 121
Ilias Gialampoukidi