E-Book, Englisch, 277 Seiten
Minker / Lee / Nakamura Spoken Dialogue Systems Technology and Design
1. Auflage 2010
ISBN: 978-1-4419-7934-6
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 277 Seiten
ISBN: 978-1-4419-7934-6
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark
Spoken Dialogue Systems Technology and Design covers key topics in the field of spoken language dialogue interaction from a variety of leading researchers. It brings together several perspectives in the areas of corpus annotation and analysis, dialogue system construction, as well as theoretical perspectives on communicative intention, context-based generation, and modelling of discourse structure. These topics are all part of the general research and development within the area of discourse and dialogue with an emphasis on dialogue systems; corpora and corpus tools and semantic and pragmatic modelling of discourse and dialogue.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;5
2;Contents;10
3;Contributing Authors;14
4;Chapter 1 MULTILINGUAL SPEECH INTERFACESFOR RESOURCE-CONSTRAINEDDIALOGUE SYSTEMS;23
4.1;1. Introduction;24
4.2;2. Literature Review;25
4.2.1;2.1 Review of Multilingual Speech Recognition;25
4.2.2;2.2 Review of Non-Native Speech Recognition;26
4.3;3. Approach;27
4.4;4. Experimental Setup;27
4.4.1;4.1 Training and Test Data;27
4.4.2;4.2 Benchmark System;29
4.5;5. Accent Adaptation;29
4.5.1;5.1 Monophones vs. Triphones;29
4.5.2;5.2 Multilingual MWC System;30
4.5.3;5.3 Model Merging;33
4.5.4;5.4 Adaptation with Non-Native Speech;35
4.6;6. Scalable Architecture;37
4.6.1;6.1 Projections between GMMs;37
4.6.2;6.2 Scalable Architecture;42
4.6.3;6.3 Footprint;45
4.7;7. Summary;46
4.8;Notes;47
5;Chapter 2 ONLINE LEARNING OF BAYESRISK-BASED OPTIMIZATION OFDIALOGUE MANAGEMENTFOR DOCUMENT RETRIEVALSYSTEMS WITH SPEECHINTERFACE;51
5.1;1. Introduction;52
5.2;2. Dialogue Management and ResponseGeneration in Document Retrieval System;54
5.2.1;2.1 System Overview;54
5.2.2;2.2 Knowledge Base (KB);55
5.2.3;2.3 Backend Retrieval System;56
5.2.4;2.4 Backend Question-Answering System;57
5.2.5;2.5 Use of N-Best Hypotheses of ASR andContextual Information for GeneratingResponses;58
5.2.6;2.6 Field Test of Trial System;58
5.3;3. Optimization of Dialogue Management inDocument Retrieval System;59
5.3.1;3.1 Choices in Generating Responses;59
5.3.2;3.2 Optimization of Responses based on BayesRisk;59
5.3.3;3.3 Generation of Response Candidates;60
5.3.4;3.4 Definition of Bayes Risk for CandidateResponse;61
5.3.5;3.5 Confidence Measure of InformationRetrieval and Question-Answering;64
5.4;4. Online Learning of Bayes Risk-basedDialogue Management;64
5.4.1;4.1 Parameter Optimization by Online Learning;64
5.4.2;4.2 Optimization using Maximum LikelihoodEstimation;65
5.4.3;4.3 Optimization using Steepest Descent;66
5.4.4;4.4 Online Learning Method usingReinforcement Learning;66
5.5;5. Evaluation of Online Learning Methods;68
5.6;6. Conclusions;70
5.7;Notes;70
5.8;References;71
6;Chapter 3 TOWARDS FINE-GRAINUSER-SIMULATION FORSPOKEN DIALOGUE SYSTEMS;75
6.1;1. Introduction;76
6.2;2. Related Work;78
6.2.1;2.1 Rule-based User Simulators;79
6.2.2;2.2 Corpus-based User Simulators;81
6.2.3;2.3 Hybrid User Simulators;85
6.2.4;2.4 Evaluation of User Simulators;85
6.2.4.1;2.4.1 Direct Methods.;86
6.2.4.2;2.4.2 Indirect Methods.;87
6.3;3. Our User Simulators;88
6.3.1;3.1 The Initial User Simulator;88
6.3.2;3.2 The Enhanced User Simulator;90
6.4;4. Experiments;91
6.4.1;4.1 Speech Database and Scenario Corpus;92
6.4.2;4.2 Language Models for Speech Recognition;93
6.5;5. Results;93
6.5.1;5.1 Detection of Problems in the Performanceof the Dialogue System;94
6.5.1.1;5.1.2 Findings for the Medium Cooperativeness Level.;95
6.5.1.2;5.1.3 Findings for the Low Cooperativeness Level.;97
6.5.2;5.2 Future Work;98
6.6;6. Conclusions;98
6.7;Acknowledgments;99
7;Chapter 4 SALIENT FEATURES FOR ANGERRECOGNITION IN GERMAN ANDENGLISH IVR PORTALS;104
7.1;1. Introduction;105
7.2;2. Related Work;106
7.3;3. Overview of Database Conditions;106
7.4;4. Selected Corpora;108
7.5;5. Prosodic and Acoustic Modeling;109
7.5.1;5.1 Audio Descriptor Extraction;110
7.5.1.1;5.1.1 Pitch.;110
7.5.1.2;5.1.2 Loudness.;110
7.5.1.3;5.1.3 MFCC.;110
7.5.1.4;5.1.4 Spectrals.;110
7.5.1.5;5.1.5 Formants.;111
7.5.1.6;5.1.6 Intensity.;111
7.5.1.7;5.1.7 Others.;111
7.5.2;5.2 Statistic Feature Definition;111
7.6;6. Feature Ranking;113
7.7;7. Normalization;115
7.8;8. Classification;116
7.8.1;8.1 Cross Validation;116
7.8.2;8.2 Evaluation Measurement;117
7.8.3;8.3 Classification Algorithm;118
7.9;9. Experiments and Results;118
7.9.1;9.1 Analyzing Feature Distributions;118
7.9.2;9.2 Optimal Feature Sets;120
7.9.3;9.3 Optimal Classification;121
7.10;10. Discussion;122
7.10.1;10.0.1 Signal Quality.;122
7.10.2;10.0.2 Speech Length.;122
7.10.3;10.0.3 Speech Transcription.;123
7.11;11. Conclusions;123
7.12;Acknowledgments;124
7.13;References;124
8;Chapter 5 PSYCHOMIME CLASSIFICATIONAND VISUALIZATIONUSING A SELF-ORGANIZING MAPFOR IMPLEMENTING EMOTIONALSPOKEN DIALOGUE SYSTEM;127
8.1;1. Introduction;128
8.2;2. Psychomimes and Emotional SpokenDialogue Systems;129
8.2.1;2.1 Onomatopoeias and Psychomimes;129
8.2.2;2.2 Emotional Spoken Dialogue Systems;130
8.3;3. Self-Organizing Map;131
8.3.1;3.1 What is SOM?;131
8.3.2;3.2 Natural Language Processing Studies usingSOM;133
8.4;4. Experiment;135
8.4.1;4.1 Psychomimes and their Groupings;135
8.4.2;4.2 Corpus;136
8.4.3;4.3 Vector Space;136
8.4.4;4.4 SOM Parameters;138
8.4.5;4.5 Results;139
8.4.5.1;4.5.1 Determination of Group Areas.;139
8.4.5.2;4.5.2 Recall and Precision.;141
8.4.5.3;4.5.3 Effects of Selecting Frames and Combinationsof Frames.;145
8.4.5.4;4.5.4 Effects of Narrowing Area of Groups.;148
8.4.5.5;4.5.5 How to Take Advantage of Knowledge.;151
8.4.5.6;4.5.6 Toward Implementing Emotional Spoken DialogueSystem.;151
8.5;5. Conclusions and Future Work;152
8.6;References;152
9;Chapter 6 TRENDS, CHALLENGESAND OPPORTUNITIES IN SPOKENDIALOGUE RESEARCH;155
9.1;1. Introduction;155
9.2;2. Research in Spoken Dialogue Technology;156
9.2.1;2.1 The Nature of Dialogue Research;156
9.2.2;2.2 Academic and Commercial Research;157
9.2.3;2.3 Three Decades of Research in SpokenDialogue Systems;158
9.2.4;2.4 Application Areas for Dialogue Research;163
9.3;3. Challenges for Researchers in SpokenDialogue Systems;163
9.3.1;3.1 Conducting Research in Spoken DialogueSystems;165
9.3.2;3.2 The Availability of Resources for the Designand Development of Spoken DialogueSystems;166
9.4;4. Opportunities for Future Research inDialogue;168
9.4.1;4.1 Incorporating Dialogue into Voice Search;168
9.4.2;4.2 Using Dialogue Systems in AmbientIntelligence Environments;170
9.4.3;4.3 CHAT;171
9.4.4;4.4 SmartKom and SmartWeb;172
9.4.5;4.5 TALK;173
9.4.6;4.6 COMPANIONS;174
9.4.7;4.7 Atraco;175
9.4.8;4.8 Summary;175
9.5;5. Concluding Remarks;176
9.6;Web Pages;177
9.7;Notes;178
10;Chapter 7 DIALOGUE CONTROL BY POMDPUSING DIALOGUE DATA STATISTICS;182
10.1;1. Introduction;183
10.2;2. Partially Observable Markov DecisionProcess;185
10.2.1;2.1 POMDP Structure;185
10.2.2;2.2 Running Cycle and Value Iteration;187
10.3;3. Dialogue Control using POMDP from LargeAmounts of Data;188
10.3.1;3.1 Purpose of Dialogue Control;188
10.3.2;3.2 Automatically Acquiring POMDPParameters and Obtaining a Policy forTarget Dialogues;189
10.3.3;3.3 Reflecting Action Predictive Probabilities inAction Control;192
10.4;4. Evaluation and Results;196
10.5;5. Discussion;198
10.6;6. Future Work;200
10.7;7. Conclusions;202
10.8;Acknowledgments;203
10.9;References;203
11;Chapter 8 PROPOSAL FOR A PRACTICALSPOKEN DIALOGUE SYSTEMDEVELOPMENT METHOD;206
11.1;1. Introduction;206
11.2;2. Overview of the Data-Management CenteredPrototyping Method;208
11.3;3. Prototyping of a Slot-Filling Dialogue System;210
11.3.1;3.1 Data Model Definition;210
11.3.2;3.2 Controller Script;211
11.3.3;3.3 View Files;212
11.3.4;3.4 Adding a Multi-Modal Interface to a GUIWeb Application;212
11.3.5;3.5 Generating Speech Interaction;213
11.3.6;3.6 Enabling Multi-Modal Interaction;215
11.3.7;3.7 Generation of Dialogue Flow;215
11.3.8;3.8 The Result of the Prototyping;216
11.4;4. Prototyping of a DB-Search Dialogue System;217
11.5;5. Prototyping of a Multi-Modal InteractivePresentation System;219
11.5.1;5.1 Dialogue Pattern Generation from Metadata;220
11.5.2;5.2 Generation of QA Database;222
11.5.3;5.3 Adaptation of the Language Model;223
11.5.4;5.4 Implementation and Evaluation;224
11.6;6. Incorporation of the User Model;226
11.6.1;6.1 User Model in Multi-Modal InteractionSystems;226
11.6.2;6.2 User Model Component of MIML;227
11.6.3;6.3 Functions for User Adaptation;227
11.7;7. Conclusions;228
11.8;Acknowledgments;229
11.9;Notes;229
11.10;References;229
12;Chapter 9 QUALITY OF EXPERIENCINGMULTI-MODAL INTERACTION;231
12.1;1. Introduction;231
12.2;2. Advantages of Systems ProvidingMulti-Modal Interaction;232
12.2.1;2.1 Modality Relations;233
12.3;3. Quality of Experience;234
12.4;4. Audio-Video Quality Integration inAV-Transmission Services;235
12.4.1;4.1 Videotelephony;236
12.4.2;4.2 IP-Television;237
12.5;5. Quality of Embodied Conversational Agents;239
12.6;6. Quality of Systems with Multiple InputModalities;241
12.6.1;6.1 Smart Office;242
12.6.2;6.2 Mobile;243
12.6.3;6.3 Summary;244
12.7;7. Conclusions;244
12.8;Acknowledgments;246
12.9;Notes;246
12.10;References;246
13;Chapter 10 DIALOGUE ACTS ANNOTATIONTO CONSTRUCT DIALOGUE SYSTEMSFOR CONSULTING;249
13.1;1. Introduction;249
13.2;2. Kyoto Tour Guide Dialogue Corpus;251
13.3;3. Annotation of Communicative Function andSemantic Content in DA;254
13.4;4. SA Tags;254
13.4.1;4.1 Annotation Unit;254
13.4.2;4.2 Tag Specifications;256
13.4.2.1;4.2.1 General Layer.;256
13.4.2.2;4.2.2 Response Layer.;256
13.4.2.3;4.2.3 Check Layer.;257
13.4.2.4;4.2.4 Constrain Layer.;257
13.4.2.5;4.2.5 Action Discussion Layer.;257
13.4.2.6;4.2.6 Others Layer.;258
13.4.3;4.3 Evaluation of the Annotation;258
13.4.3.1;4.3.1 Distributional Statistics.;259
13.4.3.2;4.3.2 Inter-Annotator Agreement.;259
13.4.3.3;4.3.3 Analysis of the Occurrence Tendency during theProgress of the Episode.;260
13.4.4;4.4 Preliminary Experiment to Estimate SATags via SVM;262
13.5;5. Semantic Content Tags;263
13.5.1;5.1 Tag Specifications;264
13.5.2;5.2 Annotation of Semantic Content Tags;265
13.6;6. Usage of the Kyoto Tour Guide Corpus;267
13.6.1;6.1 Speech Recognition;267
13.6.2;6.2 Dialogue Management;267
13.6.3;6.3 Speech Synthesis;269
13.7;7. Conclusions;270
13.8;Notes;270
13.9;References;270
14;Chapter 11 ON THE USE OF N-GRAMTRANSDUCERS FORDIALOGUE ANNOTATION;273
14.1;1. Introduction;273
14.2;2. The HMM-based Annotation Model;275
14.3;3. The NGT Annotation Model;278
14.4;4. Corpora;282
14.4.1;4.1 SwitchBoard Corpus;284
14.4.2;4.2 DIHANA Corpus;285
14.5;5. Experimental Results;286
14.6;6. Conclusions and Future Work;291
14.7;Acknowledgments;292
14.8;Notes;292
14.9;References;292
15;Index;295




