Naylor / Gaubitch | Speech Dereverberation | E-Book | www.sack.de
E-Book

E-Book, Englisch, 388 Seiten

Reihe: Signals and Communication Technology

Naylor / Gaubitch Speech Dereverberation


1. Auflage 2010
ISBN: 978-1-84996-056-4
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark

E-Book, Englisch, 388 Seiten

Reihe: Signals and Communication Technology

ISBN: 978-1-84996-056-4
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark



Speech Dereverberation gathers together an overview, a mathematical formulation of the problem and the state-of-the-art solutions for dereverberation. Speech Dereverberation presents current approaches to the problem of reverberation. It provides a review of topics in room acoustics and also describes performance measures for dereverberation. The algorithms are then explained with mathematical analysis and examples that enable the reader to see the strengths and weaknesses of the various techniques, as well as giving an understanding of the questions still to be addressed. Techniques rooted in speech enhancement are included, in addition to a treatment of multichannel blind acoustic system identification and inversion. The TRINICON framework is shown in the context of dereverberation to be a generalization of the signal processing for a range of analysis and enhancement techniques. Speech Dereverberation is suitable for students at masters and doctoral level, as well as established researchers.

Patrick A. Naylor has a PhD in Speech Signal Processing from Imperial College London, where he is currently Reader and Director of Postgraduate Studies for the Department of Electrical and Electronic Engineering. His research interests include speech and audio signal processing; adaptive signal processing; speech enhancement in telecommunications; hands-free functionality; blind SIMO/MIMO channel estimation and dereverberation; speaker identification and verification; and speech production modelling. He is on the IEEE Technical Committee on Audio and Electroacoustics and is Associate Editor of the IEEE Transactions on Audio Speech and Language Processing.Nikolay D. Gaubitch has a PhD in Acoustic Signal Processing from Imperial College London, where he is now Research Associate. In 2001 and 2002 he was awarded the Drapers' Company Undergraduate Prize for outstanding academic achievement. His research interests span various topics in single and multichannel speech and audio processing including dereverberation, blind system identification, acoustic system equalization and speech enhancement. He is a member of the IEEE.

Naylor / Gaubitch Speech Dereverberation jetzt bestellen!

Weitere Infos & Material


1;Preface;6
2;Contents;8
3;List of Contributors;15
4;1 Introduction;17
4.1;1.1 Background;17
4.2;1.2 Effects of Reverberation;18
4.3;1.3 Speech Acquisition;19
4.4;1.4 System Description;20
4.5;1.5 Acoustic Impulse Responses;22
4.6;1.6 Literature Overview;24
4.6.1;1.6.1 Beamforming Using Microphone Arrays;24
4.6.2;1.6.2 Speech Enhancement Approaches to Dereverberation;26
4.6.3;1.6.3 Blind System Identification and Inversion;27
4.6.3.1;1.6.3.1 Blind System Identification;28
4.6.3.2;1.6.3.2 Inverse Filtering;29
4.7;1.7 Outline of the Book;30
4.8;References;31
5;2 Models, Measurement and Evaluation;37
5.1;2.1 An Overview of Room Acoustics;37
5.1.1;2.1.1 The Wave Equation;38
5.1.2;2.1.2 Sound Field in a Reverberant Room;39
5.1.3;2.1.3 Reverberation Time;40
5.1.4;2.1.4 The Critical Distance;42
5.1.5;2.1.5 Analysis of Room Acoustics Dependent on Frequency Range;43
5.2;2.2 Models of Room Reverberation;45
5.2.1;2.2.1 Intuitive Model;46
5.2.2;2.2.2 Finite Element Models;46
5.2.3;2.2.3 Digital Waveguide Mesh;46
5.2.4;2.2.4 Ray-tracing;47
5.2.5;2.2.5 Source-image Model;47
5.2.6;2.2.6 Statistical Room Acoustics;49
5.3;2.3 Subjective Evaluation;51
5.4;2.4 Channel-based Objective Measures;52
5.4.1;2.4.1 Normalized Projection Misalignment;53
5.4.2;2.4.2 Direct-to-reverberant Ratio;54
5.4.3;2.4.3 Early-to-total Sound Energy Ratio;54
5.4.4;2.4.4 Early-to-late Reverberation Ratio;55
5.5;2.5 Signal-based Objective Measures;55
5.5.1;2.5.1 Log Spectral Distortion;56
5.5.2;2.5.2 Bark Spectral Distortion;56
5.5.3;2.5.3 Reverberation Decay Tail;57
5.5.4;2.5.4 Signal-to-reverberant Ratio;59
5.5.4.1;2.5.4.1 Relationship Between DRR and SRR;59
5.5.4.2;2.5.4.2 Level Normalization in SRR;60
5.5.4.3;2.5.4.3 SRR Computation Example;62
5.5.4.4;2.5.4.4 SRR Summary;63
5.5.5;2.5.5 Experimental Comparisons;63
5.6;2.6 Dereverberation Performance of the Delay-and-sum Beamformer;66
5.6.1;2.6.1 Simulation Results: DSB Performance;67
5.6.1.1;Experiment 1: Effect of Source-microphone Distance;68
5.6.1.2;Experiment 2: Effect of Number of Microphones;68
5.7;2.7 Summary and Discussion;68
5.8;References;70
6;3 Speech Dereverberation Using Statistical Reverberation Models;73
6.1;3.1 Introduction;74
6.2;3.2 Review of Dereverberation Methods;76
6.2.1;3.2.1 Reverberation Cancellation;76
6.2.2;3.2.2 Reverberation Suppression;77
6.3;3.3 Statistical Reverberation Models;78
6.3.1;3.3.1 Polack’s Statistical Model;78
6.3.2;3.3.2 Generalized Statistical Model;79
6.4;3.4 Single-microphone Spectral Enhancement;80
6.4.1;3.4.1 Problem Formulation;81
6.4.2;3.4.2 MMSE Log-spectral Amplitude Estimator;84
6.4.3;3.4.3 a priori SIR Estimator;86
6.5;3.5 Multi-microphone Spectral Enhancement;87
6.5.1;3.5.1 Problem Formulation;87
6.5.2;3.5.2 Two Multi-microphone Systems;88
6.5.2.1;3.5.2.1 MVDR Beamformer and Single-channel MMSE Estimator;88
6.5.2.2;3.5.2.2 Non-linear Spatial Processor;91
6.5.3;3.5.3 Speech Presence Probability Estimator;91
6.6;3.6 Late Reverberant Spectral Variance Estimator;93
6.7;3.7 Estimating Model Parameters;97
6.7.1;3.7.1 Reverberation Time;97
6.7.2;3.7.2 Direct-to-reverberant Ratio;98
6.8;3.8 Experimental Results;98
6.8.1;3.8.1 Using One Microphone;99
6.8.2;3.8.2 Using Multiple Microphones;102
6.9;3.9 Summary and Outlook;104
6.10;Acknowledgment;106
6.11;References;106
7;4 Dereverberation Using LPC-based Approaches;111
7.1;4.1 Introduction;111
7.2;4.2 Linear Predictive Coding of Speech;113
7.3;4.3 LPC on Reverberant Speech;115
7.3.1;4.3.1 Effects of Reverberation on the LPC Coefficients;116
7.3.1.1;4.3.1.1 Single Microphone;116
7.3.1.2;4.3.1.2 JointMultichannel Optimization;118
7.3.1.3;4.3.1.3 LPC at the Output of a Delay-and-sum Beamformer;119
7.3.2;4.3.2 Effects of Reverberation on the Prediction Residual;120
7.3.3;4.3.3 Simulation Examples for LPC on Reverberant Speech;121
7.4;4.4 Dereverberation Employing LPC;128
7.4.1;4.4.1 Regional Weighting Function;129
7.4.2;4.4.2 Weighting Function Based on Hilbert Envelopes;129
7.4.3;4.4.3 Wavelet Extrema Clustering;129
7.4.4;4.4.4 Weight Function from Coarse Channel Estimates;129
7.4.5;4.4.5 Kurtosis Maximizing Adaptive Filter;130
7.5;4.5 Spatiotemporal Averaging Method for Enhancement of Reverberant Speech;131
7.5.1;4.5.1 Larynx Cycle Segmentation with Multichannel DYPSA;132
7.5.2;4.5.2 Time Delay of Arrival Estimation for Spatial Averaging;133
7.5.3;4.5.3 Voiced/Unvoiced/Silence Detection;134
7.5.4;4.5.4 Weighted Inter-cycle Averaging;135
7.5.5;4.5.5 Dereverberation Results;137
7.6;4.6 Summary;140
7.6.1;Appendix A;140
7.7;References;142
8;5 Multi-microphone Speech Dereverberation Using Eigen-decomposition;145
8.1;5.1 Introduction;145
8.2;5.2 Problem Formulation;149
8.3;5.3 Preliminaries;151
8.4;5.4 AIR Estimation – Algorithm Derivation;154
8.5;5.5 Extensions of the Basic Algorithm;156
8.5.1;5.5.1 Two-microphone Noisy Case;156
8.5.1.1;5.5.1.1 White Noise Case;157
8.5.1.2;5.5.1.2 Colored Noise Case;157
8.5.2;5.5.2 Multi-microphone Case (M > 2);157
8.5.3;5.5.3 Partial Knowledge of the Null Subspace;158
8.6;5.6 AIR Estimation in Subbands;159
8.7;5.7 Signal Reconstruction;160
8.8;5.8 Experimental Study;162
8.8.1;5.8.1 Full-band Version – Results;163
8.8.2;5.8.2 Subband Version – Results;166
8.9;5.9 Limitations of the Proposed Algorithms and Possible Remedies;167
8.9.1;5.9.1 Noise Robustness;168
8.9.2;5.9.2 Computational Complexity and Memory Requirements;168
8.9.3;5.9.3 Common Zeros;168
8.9.4;5.9.4 The Demand for the Entire AIR Compensation;169
8.9.5;5.9.5 Filter-bank Design;169
8.9.6;5.9.6 Gain Ambiguity;169
8.10;5.10 Summary and Conclusions;170
8.11;References;170
9;6 Adaptive Blind Multichannel System Identification;173
9.1;6.1 Introduction;173
9.2;6.2 Problem Formulation;176
9.2.1;6.2.1 Channel Identifiability Conditions;177
9.3;6.3 Review of Adaptive Algorithms for Acoustic BSI Employing Cross-relations;178
9.3.1;6.3.1 The Multichannel Least Mean Squares Algorithm;178
9.3.2;6.3.2 The Normalized Multichannel Frequency Domain LMS Algorithm;179
9.3.3;6.3.3 The Improved Proportionate NMCFLMS Algorithm;181
9.4;6.4 Effect of Noise on the NMCFLMS Algorithm – The Misconvergence Problem;183
9.5;6.5 The Constraint Based ext-NMCFLMS Algorithm;185
9.5.1;6.5.1 Effect of Noise on the Cost Function;186
9.5.2;6.5.2 Penalty Term Using the Direct-path Constraint;188
9.5.3;6.5.3 Delay Estimation;190
9.5.4;6.5.4 Flattening Point Estimation;191
9.6;6.6 Simulation Results;194
9.6.1;6.6.1 Experimental Setup;195
9.6.2;6.6.2 Variation of Convergence rate on ß;195
9.6.3;6.6.3 Degradation Due to Direct-path Estimation;196
9.6.4;6.6.4 Comparison of Algorithm Performance Using a WGN Input Signal;198
9.6.5;6.6.5 Comparison of Algorithm Performance Using Speech Input Signals;199
9.7;6.7 Conclusions;200
9.8;References;201
10;7 Subband Inversion of Multichannel Acoustic Systems;205
10.1;7.1 Introduction;205
10.2;7.2 Multichannel Equalization;209
10.3;7.3 Equalization with Inexact Impulse Responses;210
10.3.1;7.3.1 Effects of System Mismatch;212
10.3.2;7.3.2 Effects of System Length;213
10.4;7.4 Subband Multichannel Equalization;214
10.4.1;7.4.1 Oversampled Filter-banks;215
10.4.2;7.4.2 Subband Decomposition;217
10.4.3;7.4.3 Subband Multichannel Equalization;219
10.5;7.5 Computational Complexity;220
10.6;7.6 Application to Speech Dereverberation;221
10.7;7.7 Simulations and Results;223
10.7.1;7.7.1 Experiment 1: Complex Subband Decomposition;223
10.7.2;7.7.2 Experiment 2: Random Channels;225
10.7.3;7.7.3 Experiment 3: Simulated Room Impulse Responses;227
10.7.4;7.7.4 Experiment 4: Speech Dereverberation;229
10.8;7.8 Summary;231
10.9;References;231
11;8 Bayesian Single Channel Blind Dereverberation of Speech from a Moving Talker;235
11.1;8.1 Introduction and Overview;235
11.1.1;8.1.1 Model-based Framework;236
11.1.1.1;8.1.1.1 Online vs. Offline Numerical Methods;237
11.1.1.2;8.1.1.2 Parametric Estimation and Optimal Filtering methods;237
11.1.2;8.1.2 Practical Blind Dereverberation Scenarios;238
11.1.2.1;8.1.2.1 Single-sensor Applications;238
11.1.2.2;8.1.2.2 Time-varying Acoustic Channels;238
11.1.3;8.1.3 Chapter Organisation;239
11.2;8.2 Mathematical Problem Formulation;239
11.2.1;8.2.1 Bayesian Framework for Blind Dereverberation;241
11.2.2;8.2.2 Classification of Blind Dereverberation Formulations;243
11.2.3;8.2.3 Numerical Bayesian Methods;244
11.2.3.1;8.2.3.1 Markov Chain Monte Carlo;244
11.2.3.2;8.2.3.2 Sequential Monte Carlo;246
11.2.3.3;8.2.3.3 General Comments;246
11.2.4;8.2.4 Identifiability;247
11.3;8.3 Nature of Room Acoustics;249
11.3.1;8.3.1 Regions of the Audible Spectrum;250
11.3.2;8.3.2 The Room Transfer Function;251
11.3.3;8.3.3 Issues with Modelling Room Transfer Functions;252
11.3.3.1;Long and Non-minimum Phase AIRs;252
11.3.3.2;Robustness to Estimation Error and Variation of Inverse of the AIR;252
11.3.3.3;Subband and Frequency-zooming Solu;252
11.4;8.4 Parametric Channel Models;253
11.4.1;8.4.1 Pole-zero and All-zero Models;253
11.4.2;8.4.2 The Common-acoustical Pole and Zero Model;254
11.4.3;8.4.3 The All-pole Model;254
11.4.4;8.4.4 Subband All-pole Modelling;255
11.4.5;8.4.5 The Nature of Time-varying All-pole Models;258
11.4.6;8.4.6 Static Modelling of TVAP Parameters;260
11.4.7;8.4.7 Stochastic Modelling of Acoustic Channels;261
11.5;8.5 Noise and System Model;262
11.6;8.6 Source Model;264
11.6.1;8.6.1 Speech Production;264
11.6.2;8.6.2 Time-varying AR Modelling of Unvoiced Speech;265
11.6.2.1;8.6.2.1 Statistical Nature of Speech Parameter Variation;266
11.6.3;8.6.3 Static Block-based Modelling of TVAR Parameters;267
11.6.3.1;8.6.3.1 Basis Function Representation;268
11.6.3.2;8.6.3.2 Choice of Basis Functions;269
11.6.3.3;8.6.3.3 Block-based Time-varying Approach;269
11.6.4;8.6.4 Stochastic Modelling of TVAR Parameters;270
11.7;8.7 Bayesian Blind Dereverberation Algorithms;272
11.7.1;8.7.1 Offline Processing Using MCMC;272
11.7.1.1;8.7.1.1 Likelihood for Source Signal;272
11.7.1.2;8.7.1.2 Complete Likelihood for Observations;273
11.7.1.3;8.7.1.3 Prior Distributions of Source, Channel and Error Residual;273
11.7.1.4;8.7.1.4 Posterior Distribution of the Channel Parameters;274
11.7.1.5;8.7.1.5 Experimental Results;275
11.7.2;8.7.2 Online Processing Using Sequential Monte Carlo;277
11.7.2.1;8.7.2.1 Source and Channel Model;277
11.7.2.2;8.7.2.2 Conditionally Gaussian State Space;278
11.7.2.3;8.7.2.3 Methodology;279
11.7.2.4;8.7.2.4 Channel Estimation Using Bayesian Channel Updates;280
11.7.2.5;8.7.2.5 Experimental Results;281
11.7.3;8.7.3 Comparison of Offline and Online Approaches;283
11.8;8.8 Conclusions;284
11.9;References;284
12;9 Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information;287
12.1;9.1 Introduction;287
12.2;9.2 Inverse Filtering for Speech Dereverberation;288
12.2.1;9.2.1 Speech Capture Model with Multiple Microphones;289
12.2.2;9.2.2 Optimal Inverse Filtering;290
12.2.3;9.2.3 Unsupervised Algorithm to Approximate Optimal Processing;293
12.3;9.3 Approaches to Solving the Over-whitening of the Recovered Speech;296
12.3.1;9.3.1 Precise Compensation for Over-whitening of Target Speech;296
12.3.1.1;9.3.1.1 Principle;296
12.3.1.2;9.3.1.2 Close to Perfect Dereverberation;298
12.3.1.3;9.3.1.3 Dereverberation and Coherent Noise Reduction;299
12.3.1.4;9.3.1.4 Sensitivity to Incoherent N;303
12.3.2;9.3.2 Late Reflection Removal with Multichannel Multistep LP;304
12.3.2.1;9.3.2.1 Principle;305
12.3.2.2;9.3.2.2 Speech Dereverberation Performance in Terms of ASR Score;307
12.3.2.3;9.3.2.3 Speech Dereverberation in a Noisy Environment;309
12.3.2.4;9.3.2.4 Dereverberation of Multiple Sound Source Signals;311
12.3.3;9.3.3 Joint Estimation of Linear Predictors and Short-time Speech Characteristics;312
12.3.3.1;9.3.3.1 Background;312
12.3.3.2;9.3.3.2 Principle;313
12.3.3.3;9.3.3.3 Algorithms;316
12.3.4;9.3.4 Probabilistic Model Based Speech Dereverberation;318
12.3.4.1;9.3.4.1 Probabilistic Speech Model;319
12.3.4.2;9.3.4.2 Likelihood Function for Multichannel LP;320
12.3.4.3;9.3.4.3 Autocorrelation Codebook-based Speech Dereverberation;322
12.4;9.4 Concluding Remarks;324
12.4.1;Appendix A;324
12.5;References;325
13;10 TRINICON for Dereverberation of Speech and Audio Signals;327
13.1;10.1 Introduction;327
13.1.1;10.1.1 Generic Tasks for Blind Adaptive MIMO Filtering;328
13.1.2;10.1.2 A Compact Matrix Formulation for MIMO Filtering Problems;331
13.1.3;10.1.3 Overview of this Chapter;333
13.2;10.2 Ideal Inversion Solution and the Direct-inverse Approach to Blind Deconvolution;334
13.3;10.3 Ideal Solution of Direct Adaptive Filtering Problems and the Identification-and-inversion Approach to Blind Deconvolution;336
13.3.1;10.3.1 Ideal Separation Solution for Two Sources and Two Sensors;338
13.3.2;10.3.2 Relation to MIMO and SIMO System Identification;340
13.3.3;10.3.3 Ideal Separation Solution and Optimum Separation Filter Length for an Arbitrary Number of Sources and Sensors;341
13.3.4;10.3.4 General Scheme for Blind System Identification;343
13.3.5;10.3.5 Application of Blind System Identification to Blind Deconvolution;344
13.4;10.4 TRINICON – A General Framework for Adaptive MIMO Signal Processing and Application to Blind Adaptation Problems;346
13.4.1;10.4.1 Matrix Notation for Convolutive Mixtures;347
13.4.2;10.4.2 Optimization Criterion;348
13.4.3;10.4.3 Gradient-based Coefficient Update;350
13.4.3.1;10.4.3.1 Alternative Formulation of the Gradient-based Coefficient Update;353
13.4.4;10.4.4 Natural Gradient-based Coefficient Update;354
13.4.5;10.4.5 Incorporation of Stochastic Source Models;354
13.4.5.1;10.4.5.1 Spherically Invariant Random Processes as Signal Model;356
13.4.5.2;10.4.5.2 Multivariate Gaussians as Signal Model: Second-order Statistics;357
13.4.5.3;10.4.5.3 Nearly Gaussian Densities as Signal Model;357
13.5;10.5 Application of TRINICON to Blind System Identification and the Identification-and-inversion Approach to Blind Deconvolution;361
13.5.1;10.5.1 Generic Gradient-based Algorithm for Direct Adaptive Filtering Problems;361
13.5.1.1;10.5.1.1 Illustration for Second-order Statistics;362
13.5.2;10.5.2 Realizations for the SIMO Case;363
13.5.2.1;10.5.2.1 Coefficient Initialization;366
13.5.2.2;10.5.2.2 Efficient Implementation of the Sylvester Constraint for the Special Case of SIMO Models;367
13.5.3;10.5.3 Efficient Frequency-domain Realizations for the MIMO Case;369
13.6;10.6 Application of TRINICON to the Direct-inverse Approach to Blind Deconvolution;372
13.6.1;10.6.1 Multichannel Blind Deconvolution;373
13.6.2;10.6.2 Multichannel Blind Partial Deconvolution;375
13.6.3;10.6.3 Special Cases and Links to Known Algoritms;378
13.6.3.1;10.6.3.1 SIMO vs. MIMO Mixing Systems;379
13.6.3.2;10.6.3.2 Efficient Implementation Using the CorrelationMethod;379
13.6.3.3;10.6.3.3 Relations to Some Known HOS Approaches;380
13.6.3.4;10.6.3.4 Relations to Some Known SOS Approaches;381
13.7;10.7 Experiments;383
13.7.1;10.7.1 The SIMO Case;384
13.7.2;10.7.2 The MIMO Case;389
13.8;10.8 Conclusions;390
13.8.1;Appendix A: Compact Derivation of the Gradient-based Coefficient Update;390
13.8.2;Appendix B: Transformation of the Multivariate Output Signal PDF in (10.39) by Blockwise Sylvester Matrix;392
13.8.3;Appendix C: Polynomial Expansions for Nearly Gaussian Probability Densities;394
13.8.4;Appendix D: Expansion of the Sylvester Constraints in (10.83);396
13.9;References;397
14;Index;403



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.