E-Book, Englisch, Band 169, 446 Seiten, eBook
Reihe: NATO ASI Series
Ponting Computational Models of Speech Pattern Processing
1999
ISBN: 978-3-642-60087-6
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, Band 169, 446 Seiten, eBook
Reihe: NATO ASI Series
ISBN: 978-3-642-60087-6
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Proceedings of the NATO Advanced Study Institute on Computational Models of Speech Pattern Processing, held in St. Helier, Jersey, UK, July 7-18, 1997
Zielgruppe
Research
Autoren/Hrsg.
Weitere Infos & Material
Speech Pattern Processing.- 1. The State-of-the-Art in Speech.- 2. Speech Patterning.- 3. Speech Pattern Processing.- 4. Whither a Unified Theory?.- 4.1 Towards a Theory.- 4.2 Practical Issues.- 5. What We Know.- 6. Some Things We Don’t Know.- 7. The Way Forward.- References.- Psycho-acoustics and Speech Perception.- 1. Introduction.- 2. Psycho-acoustics.- 3. Speech Perception.- 3.1 Vowel Reduction and Schwa.- 3.2 Spectro-temporal Dynamics of Formant Transitions.- 3.3 Consonant Reduction.- 4. Discussion.- References.- Acoustic Modelling for Large Vocabulary Continuous Speech Recognition.- 1. Introduction.- 2. Overview of LVCSR Architecture.- 3. Front End Processing.- 4. Basic Phone Modelling.- 4.1 HMM Phone Models.- 4.2 HMM Parameter Estimation.- 4.3 Context-Dependent Phone Models.- 5. Adaptation for LVCSR.- 5.1 Maximum Likelihood Linear Regression.- 5.2 Estimating the MLLR Transforms.- 6. Progress in LVCSR.- 7. Discriminative Training for LVCSR.- 8. Conclusions.- References.- Tree-based Dependence Models for Speech Recognition.- 1. Introduction.- 2. Hidden Tree Framework.- 3. Hidden Dependence Trees.- 3.1 The Mathematical Framework.- 3.2 Application to Speech.- 3.3 Topology Design and Parameter Estimation.- 3.4 Experiments.- 4. Multiscale Tree Processes.- 4.1 The Mathematical Framework.- 4.2 Application to Speech.- 4.3 Topology Design and Parameter Estimation.- 4.4 Experiments.- 5. Discussion.- References.- Connectionist and Hybrid Models for Automatic Speech Recognition.- 1. Introduction.- 2. A Brief Overview of Neural Networks.- 2.1 Basic Principles.- 2.2 Main Models for ASR.- 3. Signal Processing and Feature Extraction using ANNs.- 4. Neural Networks as Static Pattern Classifiers.- 4.1 Speech Pattern Classification with Perceptrons.- 4.2 Feature Maps.- 5. Dynamic Aspects.- 5.1 Position of the Problem.- 5.2 Time Delays.- 5.3 Dynamic Classifiers.- 5.4 Recurrent NNs.- 6. Hybrid Models.- 6.1 Position of the Problem.- 6.2 Proposed Solutions.- 7. Conclusion.- References.- Computational Models for Auditory Speech Processing.- 1. Introduction.- 2. A nonlinear computational model for basilar membrane wave motions.- 3. Frequency-domain and time-domain computational solutions to the BM model.- 4. Interval analysis of auditory model’s outputs for temporal information extraction.- 5. IPIH representation of clean and noisy speech sounds.- 6. Speech recognition experiments.- 7. Summary and discussions.- References.- Speaker Adaptation of CDHMMs Using Bayesian Learning.- 1. Introduction.- 2. Bayesian Estimation of CDHMMs.- 2.1 Prior Density Definition.- 2.2 Forgetting Mechanism.- 2.3 Prior Parameter Estimation and MAP Solution.- 3. Acoustic Normalization.- 4. Tasks, Corpus and System.- 5. Speaker Adaptation Experiments.- 6. Conclusions.- References.- Discriminative Improvement of the Representation Space for Continuous Speech Recognition.- 1. Introduction.- 2. Discriminative Feature Extraction.- 3. SGDFE Algorithm for CSR.- 4. Experimental Results.- 5. Conclusions.- References.- Dealing with Loss of Synchronism in Multi-Band Continuous Speech Recognition Systems.- 1. Introduction.- 2. Forcing Synchronism Between the Bands.- 2.1 First Approach.- 2.2 Experiments.- 3. Modeling Loss of Synchronism.- 3.1 Theoretical Approach.- 3.2 Experimental Approach.- 4. Conclusion.- References.- K-Nearest Neighbours Estimator in a HMM-Based Recognition System.- 1. Introduction.- 2. K-NN Assessment.- 3. K-NN estimator in HMM.- 3.1 Adaptation Principle.- 3.2 HMM Estimation Improvement.- 4. Evaluations.- 4.1 Recognition rates.- 4.2 SNALC Evaluation.- 5. Perspectives.- References.- Robust Speech Recognition.- 1. Mismatches between Training and Testing.- 1.1 Speech Variation.- 1.2 Inter-Speaker Variation.- 2. Reducing Mismatches to Improve Speech Recognition.- 2.1 Principles of Adaptive Speech Recognition.- 2.2 Three Principal Adaptation Methods for Reducing Mismatches.- 2.3 Important Practical Issues.- 2.4 N-Best-Based Unsupervised Adaptation.- 3. Conclusion.- References.- Channel Adaptation.- 1. Introduction.- 1.1 Matched condition training.- 1.2 Robust features.- 1.3 Model adaptation.- 1.4 Channel adaptation.- 1.5 Speech enhancement.- 2. Models of distortion.- 2.1 Minimum mean square error.- 2.2 Additive noise estimation.- 3. Methods for channel adaptation.- 3.1 Global transformations.- 3.2 Class-specific corrections.- 3.3 Empirical methods based on stereo data.- 3.4 Model-based compensation.- 4. Conclusion.- References.- Speaker Characterization, Speaker Adaptation and Voice Conversion.- 1. Introduction.- 2. Speaker-Characterization.- 3. Speaker Recognition.- 4. Speaker-Adaptation Techniques for Speech Recognition.- 4.1 Classification of Speaker-Adaptation/Normalization Methods.- 4.2 Speaker Cluster Selection Methods.- 4.3 Interpolated Re-Estimation Algorithm.- 4.4 Spectral Mapping Algorithm.- 5. Individuality Problems in Speech Synthesis and Coding.- 6. Conclusion.- References.- Speaker Recognition.- 1. Principles of Speaker Recognition.- 2. Text-Independent Speaker Recognition Methods.- 2.1 Long-Term-Statistics-Based Methods.- 2.2 VQ-Based Methods.- 2.3 Ergodic-HMM-Based Methods.- 2.4 Speech-Recognition-Based Methods.- 3. Text-prompted Speaker Recognition.- 4. Normalization and Adaptation Techniques.- 4.1 Parameter-Domain Normalization.- 4.2 Likelihood Normalization.- 4.3 HMM Adaptation for Noisy Conditions.- 4.4 Updating Models and A Priori Threshold for Speaker Verification….- 5. Open Questions and Concluding Remarks.- References.- Application of Acoustic Discriminative Training in an Ergodic HMM for Speaker Identification.- 1. Introduction.- 2. Experimental Conditions.- 3. System Architecture.- 3.1 Acoustic Segmentation.- 3.2 The PTE-HMM Model.- 4. Experimental Results.- 5. Conclusions.- References.- Comparison of Several Compensation Techniques for Robust Speaker Verification.- 1. Introduction.- 2. The HMM recognition system.- 3. Mismatch Compensation Techniques.- 3.1 CMS.- 3.2 SMI.- 3.3 SM2.- 4. Experiments and Results.- 5. Discussion and Conclusion.- References.- Segmental Acoustic Modeling for Speech Recognition.- 1. Introduction.- 2. Segmental and Hidden Markov Models.- 2.1 General Modeling Framework.- 2.2 Models of Feature Dynamics.- 3. Recognition and Training.- 3.1 Recognition Algorithms.- 3.2 Parameter Estimation Algorithms.- 4. Segmental Features.- 5. Summary.- References.- Trajectory Representations and Acoustic Descriptions for a Segment-Modelling Approach to Automatic Speech Recognition.- 1. Introduction.- 2. Modelling Trajectories in Speech.- 3. Representing an Unobserved Trajectory with Segmental HMMs.- 3.1 Calculating segment probabilities.- 3.2 Recognition experiment.- 4. HMM Recognition with Formant Features.- 5. Modelling trajectories of cepstrum and formant features.- 6. Conclusions.- References.- Suprasegmental Modelling.- 1. Introduction.- 2. The Verbmobil System.- 3. Computation of Prosodic Information.- 3.1 Extraction of Prosodic Features.- 3.2 Prosodic Classes.- 3.3 New Boundary Labels: The Syntactic-prosodic M-labels.- 3.4 Classification of Prosodic Events.- 3.5 Improving the Classification Results with Stochastic Language Models.- 3.6 Prosodic scoring of WHGs.- 4. The Use of Prosodic Information.- 4.1 Prosody and Syntax — Interaction with the TUG-Grammar.- 4.2 Prosody and the Other Linguistic Modules.- 5. Concluding Remarks.- 6. References.- Computational Models for Speech Production.- 1. Introduction.- 2. Speech production models in science/technology literatures.- 3. Derivation of discrete-time version of statistical task-dynamic model.- 4. Algorithms for learning task-dynamic model parameters and for likelihood computation.- 4.1 Model with deterministic, time-invariant parameters.- 4.2 Model with random, time-invariant parameters.- 4.3 Model with random, smoothly time-varying parameters.- 4.4 Discriminative learning of production models’ parameters.- 5. Other types of computational models of speech production.- 6. Summary and discussions.- References.- Articulatory Features and Associated Production Models in Statistical Speech Recognition.- 1. Introduction.- 2. Functional description of human speech communication as an encoding- decoding process.- 3. Overview of theories of speech perception.- 4. A general framework of statistical speech recognition.- 5. Brief analysis of weaknesses of current speech recognition technology.- 6. Phonological model: Overlapping articulatory features and related HMMs.- 7. Task-dynamic model of speech production.- 8. Interfacing overlapping features to task-dynamic model and a general architecture for speech recognition.- 9. Discussions: Machine speech recognition.- References.- Talker Normalization with Articulatory Analysis-by-Synthesis.- 1. Introduction.- 2. Normalization Procedure.- 3. Experiments.- 4. Conclusion.- References.- The Psycholinguistics of Spoken Word Recognition.- 1. Introduction.- 2. Overview: Models of spoken word recognition.- 3. Currency of mapping: units and the nature of lexical representations.- 4. Temporal nature of speech: early vs delayed commitment.- 4.1 Delayed commitment.- 5. Multiple lexical hypotheses, lexical competition and graded activation.- 6. Language architecture: Lexical and segmental levels.- 7. Language architecture: Lexical and sentential.- 8. Contribution of attention.- References.- Issues in Using Models for Self Evaluation and Correction of Speech.- 1. Introduction.- 2. Using models.- 3. Norm building.- 4. Matching between the subject’s world and the technical world.- 5. Settlement of the speech education program.- 6. Management of the education program.- 7. Conclusion.- References.- The Use of the Maximum Likelihood Criterion in Language Modelling.- 1. Introduction.- 2. Perplexity and Maximum Likelihood.- 3. Smoothing and Discounting for Sparse Data.- 3.1 Modelfree Discounting and Turing-Good Estimates.- 3.2 Absolute Discounting.- 4. Partitioning-Based Models.- 4.1 Equivalence Classes of Histories and Decision Trees.- 4.2 Two-Sided Partitionings and Word Classes.- 5. Word Trigger Pairs.- 6. Maximum Entropy Approach.- 7. Conclusions.- References.- Language Model Adaptation.- 1. Introduction.- 2. Background on Language Models.- 3. Adaptation paradigms.- 3.1 LM adaptation in dialogue systems.- 4. Basic statistical methods.- 4.1 Maximum a-posteriori estimation.- 4.2 Linear interpolation.- 4.3 Sublanguages mixture adaptation.- 4.4 Backing-off.- 4.5 Maximum Entropy.- 4.6 Minimum Discrimination Information.- 4.7 Generalized iterative scaling.- 4.8 Cache model and word triggers.- 5. Practical applications of adaptation paradigms.- 5.1 The 1993 ARPA evaluation method.- 5.2 Mixture based adaptation.- 5.3 Adaptation with a cache model.- 5.4 ME and MDI adaptation.- 5.5 LM adaptation in interactive systems.- 6. Conclusion.- References.- Using Natural-Language Knowledge Sources in Speech Recognition.- 1. Introduction.- 2. Issues in Language Modeling for Speech Recognition.- 3. Formal Models for Natural Language.- 3.1 Finite-State Grammars.- 3.2 Context-Free Grammars.- 3.3 Augmented Context-Free Grammars.- 3.4 Expressive Power of Grammar Formalisms and the Requirements of Natural Language.- 4. Search Architectures for Natural-Language-Based Language Models.- 4.1 Word Lattice Parsing.- 4.2 N-best Filtering or Rescoring.- 4.3 Dynamic Generation of Partial Grammar Networks.- 5. Compiling Unification Grammars into Context-Free Grammars.- 5.1 Instantiating Unification Grammars.- 5.2 Removing Left Recursion from Context-Free Grammars.- 6. Robust Natural-Language-Based Language Models.- 6.1 Combining Linguistics and Statistics in a Language Model.- 6.2 Fully Statistical Natural-Language Grammars.- 7. Summary.- References.- How May I Help You?.- 1. Introduction.- 2. A Spoken Dialog System.- 3. Database.- 4. Algorithms.- 4.1 Salient Fragment Acquisition.- 4.2 Recognizing Fragments in Speech.- 4.3 Call Classification.- 5. Experiment Results.- 6. Conclusions.- References.- of Rules into a Stochastic Approach for Language Modelling.- 1. Introduction.- 2. Stack Decoding Strategy.- 2.1 The Algorithm.- 2.2 The Evaluation Function.- 2.3 Peculiar Advantages of the Algorithm.- 3. Rules.- 3.1 Correction of Biases.- 3.2 Under-represented Structures and Long Span Dependencies.- 4. Multi Level Interactions.- 4.1 Linguistic and Syntactic.- 4.2 Phonology.- 5. Conclusion.- References.- History Integration into Semantic Classification.- 1. Introduction.- 2. Classifier.- 3. Data.- 4. Dialogue History Integration.- 5. Discussion.- References.- Multilingual Speech Recognition.- 1. Introduction.- 2. Architecture of the National SQEL Demonstrators.- 3. Language Identification with Different Amounts of Knowledge about the Training Data.- 3.1 A System with Explicit Language Identification.- 3.2 A System with Implicit Language Identification.- 3.3 Language Identification Based on Cepstral Feature Vectors.- 4. Results.- 5. Conclusions and Future Work.- References.- Toward ALISP: A proposal for Automatic Language Independent Speech Processing.- 1. Introduction.- 2. Practical benefit of ALISP.- 3. Issues specific to ALISP.- 3.1 Selecting features.- 3.2 Modeling speech units.- 3.3 Defining a derivation criterion.- 3.4 Building a lexicon.- 4. Some tools for ALISP.- 4.1 Temporal Decomposition.- 4.2 The multigram model.- 5. Experiments.- 5.1 Cross-Language Recognition.- 5.2 Very low bit rate speech coding.- 5.3 Mono-Speaker Continuous Speech Recognition.- 6. Conclusions.- References.- Interactive Translation of Conversational Speech.- 1. Introduction.- 2. Background.- 2.1 The Problem of Spoken Language Translation.- 2.2 Research Efforts on Speech Translation.- 3. JANUS-II - A Conversational Speech Translator.- 3.1 Task Domains and Data Collection.- 3.2 System Description.- 3.3 Performance Evaluation.- 4. Applications and Forms of Deployment.- 4.1 Interactive Dialog Translation.- 4.2 Portable Speech Translation Device.- 4.3 Passive Simultaneous Dialog Translation.- References.- Multimodal Speech Systems.- 1. Introduction.- 2. System Architecture: Knowledge Sources and Controllers.- 2.1 Environment Model.- 2.2 System Model.- 2.3 User Model.- 2.4 Task Model.- 2.5 Dialogue Model.- 2.6 Models Interdependency.- 2.7 Role of Speech in Multimodal Applications.- 3. Information Speech Systems.- 3.1 Spontaneous Language Characteristics.- 3.2 Case Grammar Formalism used for Task Modelling.- 3.3 Different Parsing Methods.- 3.4 Task and Dialogue Model Integration.- 4. Conclusion.- References.- Multimodal Interfaces for Multimedia Information Agents.- 1. Introduction.- 2. Interpretation of Multimodal Input.- 2.1 Multimodal Components.- 2.2 Joint Interpretation.- 3. Multimodal Error Correction.- 3.1 Multimodal Interactive Error Repair.- 3.2 Error Repair for Multimedia Information Agents.- 3.3 Evaluating Interactive Error Repair.- 4. Multimodal Information Agents.- 4.1 Information Access.- 4.2 Information Creation.- 4.3 Information Manipulation.- 4.4 Information Dissemination.- 4.5 Controlling the Interface.- 5. The QuickDoc Application.- 6. Conclusions.- References.




