Chen / Jokinen | Speech Technology | E-Book | www.sack.de
E-Book

E-Book, Englisch, 331 Seiten

Chen / Jokinen Speech Technology

Theory and Applications
1. Auflage 2010
ISBN: 978-0-387-73819-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark

Theory and Applications

E-Book, Englisch, 331 Seiten

ISBN: 978-0-387-73819-2
Verlag: Springer
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book gives an overview of the research and application of speech technologies in different areas. One of the special characteristics of the book is that the authors take a broad view of the multiple research areas and take the multidisciplinary approach to the topics. One of the goals in this book is to emphasize the application. User experience, human factors and usability issues are the focus in this book.

Dr. Chen is an associate professor at the Computing Science Department, Chalmers
University of Sweden. She has been working in Human Factors and Ergonomics
research for over 20 years and has published over 20 papers in the cognitive science
specially related to speech technology application. Chen has over 20 years teaching and
research experience on ergonomics, human factors and human-computer interaction.
She has been teaching on human cognition, human-computer interaction, usability and
user-centered design, and research methodology in undergraduate and graduate level. In
the past 8 years, her research interests in focused on speech and multimodal interaction
design in different applications.
Dr. Jokinen is a Professor of Language Technology at the University of Helsinki. She
has played a leading role in several academic and industrial research projects
concerning spoken dialogue systems, cooperative communication, adaptation, and
multimodality. She has published a large number of articles and papers, organized many
workshops in major international workshops, and given several invited talks and
seminars. She is the secretary of the SIGDial, the ISCA/ACL Special Interest Group for
Discourse and Dialogue.

Chen / Jokinen Speech Technology jetzt bestellen!

Weitere Infos & Material


1;Preface;5
2;Acknowledgments;10
3;Contents;11
4;Contributors;13
5;List of Acronyms;15
6;About the Authors;18
7;About the Editors;24
8;1 History and Development of Speech Recognition;25
8.1;1.1 Introduction;25
8.2;1.2 Five Decades of Progress in Speech Recognition;25
8.2.1;1.2.1 The First-Generation Technology (1950s and 1960s);26
8.2.2;1.2.2 The Second-Generation Technology (Late 1960s and 1970s);26
8.2.3;1.2.3 The Third-Generation Technology (1980s);28
8.2.4;1.2.4 The Third Generation, Further Advances (1990s);29
8.2.5;1.2.5 The Third Generation, Further Advances (2000s);30
8.2.6;1.2.6 Summary of Technological Progress;31
8.2.7;1.2.7 Changes in the Past Three Decades;33
8.3;1.3 Research Issues toward the Fourth-Generation ASR Technology;34
8.3.1;1.3.1 How to Narrow the Gap Between Machine and Human Speech Recognition;34
8.3.2;1.3.2 Robust Acoustic Modeling;35
8.3.3;1.3.3 Robust Language Modeling;37
8.3.4;1.3.4 Speech Corpora;38
8.4;1.4 Conclusion;39
8.5;References;39
9;2 Challenges in Speech Synthesis;43
9.1;2.1 Introduction;43
9.2;2.2 Thousand Years of Speech Synthesis Research;45
9.2.1;2.2.1 From Middle Ages Over Enlightenment to Industrial Revolution: Mechanical Synthesizers;45
9.2.2;2.2.2 The 20th Century: Electronic Synthesizers;47
9.3;2.3 The Many Hats of Speech Synthesis Challenges;48
9.3.1;2.3.1 Evaluation, Standardization, and Scientific Exchange;48
9.3.2;2.3.2 Techniques of Speech Synthesis;50
9.3.2.1;2.3.2.1 Concatenative Synthesis;51
9.3.2.2;2.3.2.2 HMM-Based Synthesis;52
9.3.2.3;2.3.2.3 Voice Conversion;53
9.4;2.4 Conclusion;54
9.5;References;54
10;3 Spoken Language Dialogue Models;57
10.1;3.1 Introduction;57
10.2;3.2 Historical Overview;58
10.2.1;3.2.1 Early Ideas of a Thinking Machine;59
10.2.2;3.2.2 Experimental Prototypes and Dialogue Models;60
10.2.3;3.2.3 Large-Scale Projects: From Written to Spoken Dialogues;61
10.2.4;3.2.4 Dialogue Technology: Industrial Perspectives;63
10.2.5;3.2.5 Current Trends: Towards Multimodal Intelligent Systems;65
10.3;3.3 Dialogue Modelling;66
10.3.1;3.3.1 Dialogue Management Models;66
10.3.2;3.3.2 Discourse Modelling;67
10.3.2.1;3.3.2.1 Top-Down Approach;67
10.3.2.2;3.3.2.2 Dialogue Act and Plan-Based Approaches;68
10.3.2.3;3.3.2.3 Bottom-Up Approach;70
10.3.3;3.3.3 Conversational Principles;72
10.3.4;3.3.4 HCI and Dialogue Models;74
10.4;3.4 Conclusion;75
10.5;References;77
11;4 The Industry of Spoken-Dialog Systems and the Third Generation of Interactive Applications;85
11.1;4.1 Introduction;85
11.2;4.2 A Change of Perspective;86
11.3;4.3 Beyond Directed Dialog;88
11.4;4.4 Architectural Evolution and Standards;89
11.5;4.5 The Structure of the Spoken-Dialog Industry;93
11.6;4.6 The Speech Application Lifecycle;94
11.7;4.7 Speech 3.0: The Third Generation of Spoken-Dialog Systems;97
11.8;4.8 Conclusions;99
11.9;References;100
12;5 Deceptive Speech: Clues from Spoken Language;102
12.1;5.1 Introduction;102
12.2;5.2 Perceptual and Descriptive Studies of Deception;103
12.3;5.3 Practitioners Lore;106
12.4;5.4 Computational Approaches to Deceptive Speech;107
12.4.1;5.4.1 Lexical and Semantic Analysis;107
12.4.2;5.4.2 Voice Stress Analysis;107
12.5;5.5 Machine-Learning Approaches;108
12.6;5.6 Conclusion;109
12.7;References;109
13;6 Cognitive Approaches to Spoken Language Technology;112
13.1;6.1 Introduction;112
13.1.1;6.1.1 Limitations of Current Technology;112
13.1.2;6.1.2 What Is Missing?;113
13.2;6.2 Models of Natural Cognition;114
13.2.1;6.2.1 Cognitive Science;115
13.2.2;6.2.2 Hierarchical Control;116
13.2.3;6.2.3 Emulation Mechanisms;117
13.2.4;6.2.4 Mirror Neurons;117
13.3;6.3 Artificial Cognitive Systems;119
13.3.1;6.3.1 Embodied Cognition;119
13.3.2;6.3.2 Grounding Language;120
13.4;6.4 Roadmap for the Future;121
13.4.1;6.4.1 The Way Forward?;121
13.4.2;6.4.2 A New Scientific Discipline: Cognitive Informatics;123
13.5;References;124
14;7 Expressive Speech Processing and Prosody Engineering: An Illustrated Essay on the Fragmented Nature of Real Interactive Speech;127
14.1;7.1 Introduction;127
14.2;7.2 Prosodic Information Exchange;128
14.2.1;7.2.1 Natural Interactive Speech;128
14.2.2;7.2.2 Two-Way Interactive Speech;130
14.2.3;7.2.3 Speech Fragments;133
14.3;7.3 Acoustic Correlates of Discourse-Related Non-verbal Speech Sounds;133
14.3.1;7.3.1 Voice Quality, Prosody, and Affect;134
14.3.2;7.3.2 Multi-speaker Variation in Prosody and Tone-of-Voice;135
14.4;7.4 Technological Applications;138
14.4.1;7.4.1 Discourse Flow and Prosody Engineering;139
14.4.2;7.4.2 Sensing Affect; Detecting Changes in People from Variation in Their Speaking Style and Tone-of-Voice;139
14.4.3;7.4.3 Toward the Synthesis of Expressive Speech;140
14.5;7.5 Discussion;141
14.6;7.6 Conclusion;142
14.7;References;142
15;8 Interacting with Embodied Conversational Agents;144
15.1;8.1 Introduction;144
15.2;8.2 Types of Conversational Settings;145
15.2.1;8.2.1 TV-Style Presenters;146
15.2.2;8.2.2 Virtual Dialogue Partners;147
15.2.3;8.2.3 Role-Plays and Simulated Conversations;147
15.2.4;8.2.4 Multi-threaded Multi-party Conversation;148
15.3;8.3 Dialogue Management;149
15.4;8.4 Communicative Signals;151
15.5;8.5 Emotional Signals;154
15.6;8.6 Expressive Behaviours;155
15.7;8.7 Perceptive Behaviours;157
15.8;8.8 Social Talk;158
15.9;8.9 Design Methodology for Modelling ECA Behaviours;160
15.10;8.10 Evaluation of Verbal and Non-verbal Dialogue Behaviours;162
15.10.1;8.10.1 Studies Focusing on the Relationship Between Verbal and Non-verbal Means;162
15.10.2;8.10.2 Studies Investigating the Benefit of Empirically Grounded ECA Dialogue Behaviours;163
15.10.3;8.10.3 Studies Investigating the Dialogue Behaviours of Humans Interacting with an ECA;163
15.10.4;8.10.4 Studies Investigating Social Aspects of ECA Dialogue Behaviours;164
15.11;8.11 Conclusion;165
15.12;References;165
16;9 Multimodal Information Processing for Affective Computing;171
16.1;9.1 Introduction;171
16.2;9.2 Multimodal-Based Affective HumanComputer Interaction;172
16.2.1;9.2.1 Emotional Speech Processing;172
16.2.2;9.2.2 Affect in Facial Expression;174
16.2.3;9.2.3 Affective Multimodal System;176
16.2.4;9.2.4 Affective Understanding;177
16.3;9.3 Projects and Applications;178
16.3.1;9.3.1 Affective-Cognitive for Learning and Decision Making;178
16.3.2;9.3.2 Affective Robot;178
16.3.3;9.3.3 Oz;178
16.3.4;9.3.4 Affective Facial and Vocal Expression;179
16.3.5;9.3.5 Affective Face-to-Face Communication;179
16.3.6;9.3.6 Humaine;179
16.4;9.4 Research Challenges;180
16.4.1;9.4.1 Cognitive Structure of Affects;180
16.4.2;9.4.2 Multimodal-Based Affective Information Processing;181
16.4.3;9.4.3 Affective Features Capturing in Real Environments;181
16.4.4;9.4.4 Affective Interaction in Multi-agent Systems;182
16.5;9.5 Conclusion;182
16.6;References;183
17;10 Spoken Language Translation;187
17.1;10.1 The Dream of the Universal Translator;187
17.2;10.2 Component Technologies;188
17.2.1;10.2.1 Speech Recognition Engines;188
17.2.2;10.2.2 Translation Engines;189
17.2.3;10.2.3 Synthesis;191
17.3;10.3 Specific Systems;192
17.3.1;10.3.1 SLT and MedSLT;194
17.3.2;10.3.2 Phraselator;200
17.3.3;10.3.3 Diplomat/Tongues;201
17.3.4;10.3.4 S-MINDS;205
17.3.4.1;10.3.4.1 ASR Component;205
17.3.4.2;10.3.4.2 Translation Component;206
17.3.4.3;10.3.4.3 N-Best Merging of Results;207
17.3.4.4;10.3.4.4 User Interface Component;208
17.3.4.5;10.3.4.5 Speech Synthesis Component;208
17.3.4.6;10.3.4.6 Evaluations;208
17.4;10.4 Further Directions;209
17.5;References;210
18;11 Application of Speech Technology in Vehicles;214
18.1;11.1 Introduction;214
18.2;11.2 Complicated Vehicle Information Systems;215
18.3;11.3 Driver Distraction Due to Speech Interaction;217
18.4;11.4 Speech as Input/Output Device;219
18.4.1;11.4.1 Noise Inside Vehicles;219
18.4.2;11.4.2 Identify Suitable Functions;222
18.5;11.5 Dialogue System Design;222
18.6;11.6 In-Vehicle Research Projects;223
18.7;11.7 Commercial In-Vehicle Dialogue Systems;225
18.8;11.8 Multimodal Interaction;226
18.9;11.9 Drivers States and Traits, In-Vehicle Voice and Driving Behavior;227
18.9.1;11.9.1 Driver States and Traits;227
18.9.2;11.9.2 Emotions and Driving;228
18.9.3;11.9.3 Age of Voice, Personality, and Driving;230
18.10;11.10 Usability and Acceptance;233
18.11;References;234
19;12 Spoken Dialogue Application in Space: The Clarissa Procedure Browser;239
19.1;12.1 Introduction;239
19.2;12.2 System Overview;241
19.2.1;12.2.1 Supported Functionality;241
19.2.2;12.2.2 Modules;242
19.3;12.3 Writing Voice-Navigable Documents;243
19.3.1;12.3.1 Representing Procedure-Related Discourse Context;244
19.4;12.4 Grammar-Based Recognition;246
19.4.1;12.4.1 Regulus and Alterf;246
19.4.2;12.4.2 Using Regulus and Alterf in Clarissa;247
19.4.3;12.4.3 Evaluating Speech Understanding Performance;249
19.5;12.5 Rejecting User Speech;251
19.5.1;12.5.1 The Accept/Reject Decision Task;252
19.5.2;12.5.2 An SVM-Based Approach;253
19.5.2.1;12.5.2.1 Choosing a Kernel Function;254
19.5.2.2;12.5.2.2 Making the Cost Function Asymmetric;254
19.5.3;12.5.3 Experiments;255
19.6;12.6 Side-Effect Free Dialogue Management;256
19.6.1;12.6.1 Side-Effect Free Dialogue Management;257
19.6.2;12.6.2 Specific Issues;258
19.6.2.1;12.6.2.1 ''Undo'' and ''Correction'' Moves;258
19.6.2.2;12.6.2.2 Confirmations;259
19.6.2.3;12.6.2.3 Querying the Environment;260
19.6.2.4;12.6.2.4 Regression Testing and Evaluation;260
19.7;12.7 Results of the On-Orbit Test;260
19.8;12.8 Conclusion;262
19.8.1;12.8.1 Procedures;262
19.8.2;12.8.2 Recognition;262
19.8.3;12.8.3 Response Filtering;262
19.8.4;12.8.4 Dialogue Management;263
19.8.5;12.8.5 General;263
19.8.5.1;12.8.5.1 A Note on Versions;264
19.9;Appendix: Detailed Results for System Performance;264
19.9.1;The Recognition Task;264
19.9.2;12.0.1 The Accept/Reject Task;266
19.9.3;Kernel Types;266
19.9.4;Asymmetric Error Costs;267
19.9.5;Recognition Methods;267
19.10;References;267
20;13 Military Applications: Human Factors Aspects of Speech-Based Systems;269
20.1;13.1 Introduction;269
20.2;13.2 The Military Domain;269
20.2.1;13.2.1 Users;270
20.2.2;13.2.2 Technology;271
20.2.3;13.2.3 Environment;273
20.3;13.3 Applications;275
20.3.1;13.3.1 Air;275
20.3.2;13.3.2 Land;277
20.3.3;13.3.3 Sea;279
20.4;13.4 General Discussion;280
20.4.1;13.4.1 Users;280
20.4.2;13.4.2 Technology;281
20.4.3;13.4.3 Environment;282
20.5;13.5 Future Research;282
20.5.1;13.5.1 Challenges;282
20.5.2;13.5.2 Recommendations for Future Research;284
20.6;References;285
21;14 Accessibility and Design for All Solutions Through Speech Technology;289
21.1;14.1 Introduction;289
21.1.1;14.1.1 Text and Speech Media;290
21.1.2;14.1.2 Multimedia;290
21.2;14.2 Applications for Blind blind or Partially Sighted Persons;291
21.2.1;14.2.1 Screen-reader;291
21.2.2;14.2.2 Screen-readers' Technical Requirements;293
21.2.3;14.2.3 Relationship with the TTS Module;294
21.2.4;14.2.4 Audio-Browsing Tools;295
21.2.5;14.2.5 General Purpose Speech-Enabled Applications;296
21.2.6;14.2.6 Ambient and Security Problems for the Blind User;297
21.3;14.3 Applications for the Mobility Impaired mobility impaired;298
21.4;14.4 Applications for the Speech Impaired speech impaired;302
21.5;14.5 Applications for the Hearing Impaired hearing impaired;304
21.6;14.6 Applications for the Elderly elderly;305
21.7;14.7 Accessibility and Application;307
21.7.1;14.7.1 Navigation in Built Environments and Transportation;307
21.7.2;14.7.2 Access to Complex Documents;309
21.7.3;14.7.3 Applications for Instructional Games;312
21.7.4;14.7.4 Accessibility to Ebooks;313
21.8;14.8 Conclusion;315
21.9;References;316
22;15 Assessment and Evaluation of Speech-Based Interactive Systems: From Manual Annotation to Automatic Usability Evaluation;318
22.1;15.1 Introduction;318
22.2;15.2 A Brief History of Assessment and Evaluation;319
22.2.1;15.2.1 Performance and Quality;320
22.3;15.3 Assessment of Speech-System Components;321
22.3.1;15.3.1 Assessment of Speech Recognition;323
22.3.2;15.3.2 Assessment of Speech and Natural Language Understanding;323
22.3.3;15.3.3 Assessment of Dialog Management;324
22.3.4;15.3.4 Assessment of Speech Output;325
22.4;15.4 Evaluation of Entire Systems;326
22.4.1;15.4.1 Detection and Classification of Interaction Problems;327
22.4.2;15.4.2 Parametric Description of Interactions;328
22.4.3;15.4.3 Subjective Quality Evaluation;328
22.4.4;15.4.4 Usability Inspection;329
22.5;15.5 Prediction of Quality Judgments;330
22.6;15.6 Conclusions and Future Trends;331
22.6.1;15.6.1 Multimodal, Adaptive, and Non-task-Oriented Systems;332
22.6.2;15.6.2 Semi-automatic Evaluation;333
22.6.3;15.6.3 Quality Prediction;334
22.7;References;335
23;Index;340



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.