Makino Audio Source Separation

1. Auflage 2018
ISBN: 978-3-319-73031-8
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark

Häufig gestellte Fragen zu E-Books

E-Book, Englisch, 389 Seiten

Reihe: Signals and Communication Technology

Audio Source Separation
1. Auflage 2018, 978-3-319-73030-1, Buch

E-Book, Englisch, 389 Seiten

Reihe: Signals and Communication Technology

ISBN: 978-3-319-73031-8
Verlag: Springer International Publishing
Format: PDF
Kopierschutz: 1 - PDF Watermark

Häufig gestellte Fragen zu E-Books

160,49 €

(inkl. MwSt.)

versandkostenfreie Lieferung
sofort verfügbar

This book provides the first comprehensive overview of the fascinating topic of audio source separation based on non-negative matrix factorization, deep neural networks, and sparse component analysis. The first section of the book covers single channel source separation based on non-negative matrix factorization (NMF). After an introduction to the technique, two further chapters describe separation of known sources using non-negative spectrogram factorization, and temporal NMF models. In section two, NMF methods are extended to multi-channel source separation. Section three introduces deep neural network (DNN) techniques, with chapters on multichannel and single channel separation, and a further chapter on DNN based mask estimation for monaural speech separation. In section four, sparse component analysis (SCA) is discussed, with chapters on source separation using audio directional statistics modelling, multi-microphone MMSE-based techniques and diffusion map methods. The book brings together leading researchers to provide tutorial-like and in-depth treatments on major audio source separation topics, with the objective of becoming the definitive source for a comprehensive, authoritative, and accessible treatment. This book is written for graduate students and researchers who are interested in audio source separation techniques based on NMF, DNN and SCA.

SHOJI MAKINO (F) received the B. E., M. E., and Ph.D. degrees from Tohoku University, Japan, in 1979, 1981, and 1993, respectively. He joined NTT in 1981. He is now a Professor at University of Tsukuba. His research interests include adaptive filtering technologies, realization of acoustic echo cancellation, blind source separation of convolutive mixtures of speech, and acoustic signal processing for speech and audio applications.

Dr. Makino received the IEEE SPS Best Paper Award in 2014, the IEEE MLSP Competition Award in 2007, the ICA Unsupervised Learning Pioneer Award in 2006, the Commendation for Science and Technology of Japanese Government in 2015, the TELECOM System Technology Award in 2015 and 2004, the Achievement Award of the Institute of Electronics, Information, and Communication Engineers (IEICE) in 1997, and the Outstanding Technological Development Award of the Acoustical Society of Japan (ASJ) in 1995, the Paper Award of the IEICE in 2005 and 2002, the Paper Award of the ASJ in 2005 and 2002. He is the author or co-author of more than 200 articles in journals and conference proceedings and is responsible for more than 150 patents. He was a Keynote Speaker at ICA2007 and a Tutorial speaker at EMBC 2013, Interspeech 2011 and ICASSP 2007.

Dr. Makino IEEE activities include: Member, SPS Technical Directions Board (2013-14), SPS Awards Board (2006-08), SPS Conference Board (2002-04), IEEE Jack S. Kilby Signal Processing Medal Committee (2015-), IEEE James L. Flanagan Speech & Audio Processing Award Committee (2008-11) and Member and Chair, SPS Audio and Electroacoustics Technical Committee (1993-09 and 2013-14, respectively); SPS Distinguished Lecturer (2009-10); Chair, Circuits and Systems Society Blind Signal Processing Technical Committee (2009-2010); Associate Editor, IEEE Transactions on Speech and Audio Processing (2002-05) and EURASIP Journal on Advances in Signal Processing (2005-2012). He was the Vice President, Engineering Sciences Society of the IEICE (2007-08) and Chair, Engineering Acoustics Technical Committee of the IEICE (2006-08). He is a Member, International IWAENC Standing committee and International ICA Steering Committee; General Chair, WASPAA2007 and IWAENC2003; Organizing Chair, ICA2003; and Plenary Chair, ICASSP2012.

Dr. Makino is an IEEE Fellow, an IEICE Fellow, a Board member of the ASJ, and a member of EURASIP and ISCA.

Makino Audio Source Separation jetzt bestellen!

Autoren/Hrsg.

Makino, Shoji

Weitere Infos & Material

Inhaltsverzeichnis

1;Preface;6
2;Contents;8
3;1 Single-Channel Audio Source Separation with NMF: Divergences, Constraints and Algorithms;10
3.1;1.1 Introduction;10
3.2;1.2 Signal Decomposition by NMF;12
3.2.1;1.2.1 NMF by Optimisation;13
3.2.2;1.2.2 Composite Models;15
3.2.3;1.2.3 Majorisation-Minimisation;17
3.3;1.3 Advanced Decompositions for Source Separation;20
3.3.1;1.3.1 Pre-specified Dictionaries;20
3.3.2;1.3.2 Penalised NMF;25
3.3.3;1.3.3 User-guided NMF;27
3.4;1.4 Conclusions;29
3.5;References;30
4;2 Separation of Known Sources Using Non-negative Spectrogram Factorisation;34
4.1;2.1 Introduction;34
4.2;2.2 NMF Model for Separation of Known Sounds;35
4.2.1;2.2.1 Estimation Criteria and Algorithms;37
4.3;2.3 Sound Dictionary Learning and Adaptation;39
4.3.1;2.3.1 Generative Dictionaries;40
4.3.2;2.3.2 Discriminative Dictionaries;45
4.3.3;2.3.3 Dictionary Adaptation;45
4.4;2.4 Semi-supervised Separation;47
4.5;2.5 Low-Latency Separation;49
4.5.1;2.5.1 Algorithmic and Processing Latency;50
4.5.2;2.5.2 Use of Coupled Dictionaries for Very Low Latency Separation;51
4.5.3;2.5.3 Factorisation;54
4.6;2.6 Conclusions and Discussion;55
4.7;References;56
5;3 Dynamic Non-negative Models for Audio Source Separation;58
5.1;3.1 Introduction;58
5.2;3.2 The PLCA Models;59
5.3;3.3 Convolutional Models;60
5.4;3.4 Non-negative Hidden Markov Models;64
5.4.1;3.4.1 Single Source Models;65
5.4.2;3.4.2 Source Separation;67
5.4.3;3.4.3 Illustrative Examples;69
5.5;3.5 Dynamic PLCA Using Continuous State-Space Representation;72
5.5.1;3.5.1 Model Definitions;72
5.5.2;3.5.2 Estimation Methods;73
5.5.3;3.5.3 Illustrative Examples;75
5.6;3.6 Conclusions;78
5.7;References;79
6;4 An Introduction to Multichannel NMF for Audio Source Separation;81
6.1;4.1 Introduction;81
6.2;4.2 Local Gaussian Model;83
6.3;4.3 Spectral Models;85
6.3.1;4.3.1 NMF Modeling of Each Source;85
6.3.2;4.3.2 Joint NTF Modeling of All Sources;87
6.4;4.4 Spatial Models and Constraints;89
6.5;4.5 Main Steps and Sources Estimation;91
6.6;4.6 Model Estimation Criteria;92
6.6.1;4.6.1 Maximum Likelihood;92
6.6.2;4.6.2 Maximum a Posteriori;92
6.6.3;4.6.3 Other Criteria;93
6.7;4.7 Model Estimation Algorithms;93
6.7.1;4.7.1 Variants of EM Algorithm;94
6.7.2;4.7.2 Detailed Presentation of SSEM/MU Algorithm;96
6.7.3;4.7.3 Other Algorithms;99
6.8;4.8 Conclusion;99
6.9;References;100
7;5 General Formulation of Multichannel Extensions of NMF Variants;103
7.1;5.1 Introduction;103
7.2;5.2 Problem Formulation;105
7.2.1;5.2.1 Mixing Systems;105
7.2.2;5.2.2 Likelihood Function;107
7.3;5.3 Spectral and Spatial Models;109
7.3.1;5.3.1 Spectral Models;109
7.3.2;5.3.2 Spatial Models;114
7.4;5.4 Parameter Estimation and Signal Separation;116
7.4.1;5.4.1 Parameter Estimation;116
7.4.2;5.4.2 Signal Separation;120
7.5;5.5 Categorization of State-of-the-art Approaches;121
7.6;5.6 Derivations of MNMF and MFHMM Algorithms;123
7.6.1;5.6.1 MNMF Algorithm;123
7.6.2;5.6.2 MFHMM Algorithm;127
7.6.3;5.6.3 Demixing Filter Estimation Algorithm;128
7.7;5.7 Conclusion;129
7.8;References;130
8;6 Determined Blind Source Separation with Independent Low-Rank Matrix Analysis;133
8.1;6.1 Introduction;134
8.2;6.2 Generative Source Models in IVA and NMF Based on Itakura–Saito Divergence;135
8.2.1;6.2.1 Formulation;135
8.2.2;6.2.2 IVA;136
8.2.3;6.2.3 Time-Varying Gaussian IVA;138
8.2.4;6.2.4 Itakura–Saito NMF;139
8.3;6.3 Independent Low-Rank Matrix Analysis: A Unification of IVA and Itakura–Saito NMF;142
8.3.1;6.3.1 Motivation and Strategy;142
8.3.2;6.3.2 Derivation of Cost Function;143
8.3.3;6.3.3 Update Rules;144
8.3.4;6.3.4 Summary of Algorithm;146
8.4;6.4 Relationship Between Time-Varying Gaussian IVA, ILRMA, and Multichannel NMF;149
8.4.1;6.4.1 Generative Model in MNMF and Spatial Covariance;149
8.4.2;6.4.2 Existing MNMF Models;150
8.4.3;6.4.3 Equivalence Between ILRMA and MNMF with Rank-1 Spatial Model;150
8.5;6.5 Experiments on Speech and Music Separation;153
8.5.1;6.5.1 Datasets;153
8.5.2;6.5.2 Experimental Analysis of Optimal Number of Bases for ILRMA;155
8.5.3;6.5.3 Comparison of Separation Performance;156
8.6;6.6 Conclusions;159
8.7;References;160
9;7 Deep Neural Network Based Multichannel Audio Source Separation;164
9.1;7.1 Introduction;164
9.2;7.2 Background;166
9.2.1;7.2.1 Problem Formulation;166
9.2.2;7.2.2 Multichannel Gaussian Model;167
9.2.3;7.2.3 General Iterative EM Framework;168
9.3;7.3 DNN-Based Multichannel Source Separation;170
9.3.1;7.3.1 Algorithm;170
9.3.2;7.3.2 Cost Functions;171
9.3.3;7.3.3 Weighted Spatial Parameter Updates;173
9.4;7.4 Experimental Evaluation;173
9.4.1;7.4.1 General System Design;174
9.4.2;7.4.2 Application: Speech Enhancement;177
9.4.3;7.4.3 Application: Music Separation;183
9.5;7.5 Closing Remarks;188
9.6;References;188
10;8 Efficient Source Separation Using Bitwise Neural Networks;193
10.1;8.1 Introduction;193
10.2;8.2 A Basic Neural Network for Source Separation;195
10.3;8.3 Binary Features for Audio Signals;198
10.3.1;8.3.1 Winner-Take-All Hashing;198
10.3.2;8.3.2 Semantic Hashing;200
10.3.3;8.3.3 Quantization and Dispersion;201
10.4;8.4 BNN Feedforward;202
10.4.1;8.4.1 The Feedforward Procedure;202
10.4.2;8.4.2 Linear Separability;203
10.4.3;8.4.3 Efficiency;203
10.5;8.5 BNN Training;205
10.5.1;8.5.1 The First Round: Weight Compressed DNN;205
10.5.2;8.5.2 The Second Round: Noisy Feedforward and Sparsity;206
10.6;8.6 Experimental Results;208
10.6.1;8.6.1 The Data Set;208
10.6.2;8.6.2 Pre-processing;208
10.6.3;8.6.3 The Setup for the First Round;209
10.6.4;8.6.4 The Setup for the Second Round;209
10.6.5;8.6.5 Discussion;209
10.7;8.7 Conclusion;210
10.8;References;211
11;9 DNN Based Mask Estimation for Supervised Speech Separation;213
11.1;9.1 Speech Separation Problem;213
11.2;9.2 Classifiers and Learning Machines;215
11.2.1;9.2.1 Multilayer Perceptrons;216
11.2.2;9.2.2 Recurrent Neural Networks;217
11.3;9.3 Training Targets;219
11.4;9.4 Features;223
11.5;9.5 Speech Separation Algorithms;225
11.5.1;9.5.1 Speech-Nonspeech Separation;226
11.5.2;9.5.2 Other Separation/Enhancement Tasks;233
11.6;9.6 Conclusion;237
11.7;References;238
12;10 Informed Spatial Filtering Based on Constrained Independent Component Analysis;242
12.1;10.1 Introduction;243
12.2;10.2 Signal Model;246
12.3;10.3 Multichannel Linear Filtering for Signal Extraction;249
12.3.1;10.3.1 Linearly Constrained Minimum Variance Filter;250
12.3.2;10.3.2 The Generalized Sidelobe Canceler;251
12.4;10.4 Linearly Constrained Minimum Mutual Information-Based Signal Extraction;255
12.4.1;10.4.1 Generic Optimization Criterion;255
12.4.2;10.4.2 Constrained Natural Gradient-Descent for Iterative Optimization Update Rule;258
12.4.3;10.4.3 Definition of the Set of Constraints;260
12.4.4;10.4.4 Geometrical Interpretation of the Constrained Update Rule;261
12.4.5;10.4.5 Realization as Minimum Mutual Information-Based Generalized Sidelobe Canceler;262
12.4.6;10.4.6 Realization of the Blocking Matrix;264
12.4.7;10.4.7 Estimation of the Set of Constraints;266
12.4.8;10.4.8 Special Source Models;267
12.4.9;10.4.9 Links to Some Generic Linear Signal Extraction Methods Based on Second-Order Statistics;270
12.5;10.5 Experiments;271
12.5.1;10.5.1 Experimental Setup;271
12.5.2;10.5.2 Estimation of Relative Impulse Responses;273
12.5.3;10.5.3 Signal Enhancement;276
12.6;10.6 Conclusion;277
12.7;References;278
13;11 Recent Advances in Multichannel Source Separation and Denoising Based on Source Sparseness;284
13.1;11.1 Introduction;284
13.2;11.2 Source Separation and Denoising Based on Observation Vector Clustering;286
13.2.1;11.2.1 Mask Estimation;286
13.2.2;11.2.2 Source Signal Estimation;289
13.3;11.3 Mask Estimation Based on Modeling Directional Statistics;291
13.3.1;11.3.1 Mask Estimation Based on Complex Watson Mixture Model (cWMM);291
13.3.2;11.3.2 Mask Estimation Based on Complex Bingham Mixture Model (cBMM);294
13.3.3;11.3.3 Mask Estimation Based on Complex Gaussian Mixture Model (cGMM);296
13.4;11.4 Experimental Evaluation;297
13.4.1;11.4.1 Source Separation;297
13.4.2;11.4.2 Denoising;298
13.5;11.5 Conclusions;300
13.6;References;304
14;12 Multimicrophone MMSE-Based Speech Source Separation;306
14.1;12.1 Introduction;306
14.2;12.2 Background;308
14.2.1;12.2.1 Generic Propagation Model;308
14.2.2;12.2.2 Spatial Filtering;309
14.2.3;12.2.3 Second-Order Moments and Criteria;310
14.3;12.3 Matched Filter;312
14.3.1;12.3.1 Design;312
14.3.2;12.3.2 Performance;313
14.4;12.4 Multichannel Wiener Filter;314
14.4.1;12.4.1 Design;314
14.4.2;12.4.2 Performance;315
14.5;12.5 Multichannel LCMV;316
14.5.1;12.5.1 Design;316
14.5.2;12.5.2 Performance;317
14.6;12.6 Parameters Estimation;318
14.6.1;12.6.1 Multichannel SPP Estimators;318
14.6.2;12.6.2 Covariance Matrix Estimators;322
14.6.3;12.6.3 Procedures for Semi-blind RTF Estimation;323
14.7;12.7 Examples;324
14.7.1;12.7.1 Narrowband Signals at an Anechoic Environment;325
14.7.2;12.7.2 Speech Signals at a Reverberant Environment;331
14.8;12.8 Summary;333
14.9;References;334
15;13 Musical-Noise-Free Blind Speech Extraction Based on Higher-Order Statistics Analysis;337
15.1;13.1 Introduction;337
15.2;13.2 Single-Channel Speech Enhancement with Musical-Noise-Free Properties;339
15.2.1;13.2.1 Conventional Non-iterative Spectral Subtraction;339
15.2.2;13.2.2 Iterative Spectral Subtraction;339
15.2.3;13.2.3 Modeling of Input Signal;340
15.2.4;13.2.4 Metric of Musical Noise Generation: Kurtosis Ratio;341
15.2.5;13.2.5 Musical Noise Generation in Non-iterative Spectral Subtraction;343
15.2.6;13.2.6 Musical-Noise-Free Speech Enhancement;346
15.3;13.3 Extension to Multichannel Blind Signal Processing;348
15.3.1;13.3.1 Blind Spatial Subtraction Array;348
15.3.2;13.3.2 Iterative Blind Spatial Subtraction Array;348
15.3.3;13.3.3 Accuracy of Wavefront Estimated by Independent Component Analysis After Spectral Subtraction;350
15.4;13.4 Improvement Scheme for Poor Noise Estimation;355
15.4.1;13.4.1 Channel Selection in Independent Component Analysis;355
15.4.2;13.4.2 Time-Variant Noise Power Spectral Density Estimator;355
15.5;13.5 Experiments in Real World;356
15.5.1;13.5.1 Experimental Conditions;356
15.5.2;13.5.2 Objective Evaluation;356
15.5.3;13.5.3 Subjective Evaluation;361
15.6;13.6 Conclusions and Remarks;363
15.7;References;365
16;14 Audio-Visual Source Separation with Alternating Diffusion Maps;369
16.1;14.1 Introduction;369
16.2;14.2 Problem Formulation;371
16.3;14.3 Separation of the Common Source via Alternating Diffusion Maps;372
16.3.1;14.3.1 Alternating Diffusion Maps;372
16.3.2;14.3.2 Separation of the Common Source;374
16.3.3;14.3.3 Online Extension;375
16.3.4;14.3.4 Source Activity Detection;375
16.4;14.4 Experimental Results;376
16.4.1;14.4.1 Experimental Setup;376
16.4.2;14.4.2 Activity Detection of the Common Source;377
16.4.3;14.4.3 Discussion—Sound Source Separation;381
16.5;14.5 Conclusions;384
16.6;References;385
17;Index;387

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

160,49 € (inkl. MwSt.)

sofort verfügbar

Webcode: sack.de/spe54