Sen / Dutta / Dey | Audio Processing and Speech Recognition | E-Book | www.sack.de
E-Book

E-Book, Englisch, 107 Seiten

Reihe: SpringerBriefs in Computational Intelligence

Sen / Dutta / Dey Audio Processing and Speech Recognition

Concepts, Techniques and Research Overviews
1. Auflage 2019
ISBN: 978-981-13-6098-5
Verlag: Springer Nature Singapore
Format: PDF
Kopierschutz: 1 - PDF Watermark

Concepts, Techniques and Research Overviews

E-Book, Englisch, 107 Seiten

Reihe: SpringerBriefs in Computational Intelligence

ISBN: 978-981-13-6098-5
Verlag: Springer Nature Singapore
Format: PDF
Kopierschutz: 1 - PDF Watermark



This book offers an overview of audio processing, including the latest advances in the methodologies used in audio processing and speech recognition. First, it discusses the importance of audio indexing and classical information retrieval problem and presents two major indexing techniques, namely Large Vocabulary Continuous Speech Recognition (LVCSR) and Phonetic Search. It then offers brief insights into the human speech production system and its modeling, which are required to produce artificial speech. It also discusses various components of an automatic speech recognition (ASR) system.  Describing the chronological developments in ASR systems, and briefly examining the statistical models used in ASR as well as the related mathematical deductions, the book summarizes a number of state-of-the-art classification techniques and their application in audio/speech classification.  By providing insights into various aspects of audio/speech processing and speech recognition, this book appeals a wide audience, from researchers and postgraduate students to those new to the field.




Soumya Sen is an Assistant Professor at A. K. Choudhury School of Information Technology, University of Calcutta. He received his Ph.D. (Tech) degree from the Department of Computer Science and Engineering, at the same university, in 2016. Before joining A. K. Choudhury School of Information Technology, he worked at IBM India Pvt. Ltd and RS Software. His industrial expertise includes ERP and data warehousing. Currently his research interests are data warehousing and OLAP tools, data mining, big data, service engineering, distributed databases, and machine learning. He has published 1 book, 70 research papers in peer-reviewed journals and international conferences and registered 3 patents in USA, Japan and South Korea. Dr. Sen is a PC member and reviewer for numerous International conferences. Anjan Dutta was born in Kolkata, India, in 1986. He received his B.Tech degree in Information Technology from West Bengal University of Technology in 2008 and M.Tech in Information Technology in 2011 from Calcutta University.He served in IXIA Technologies LTD and TATA Consultancy Services Ltd. (TCSL) over 6 years of period. Initially he worked as a protocol developer in IXIA Technologies LTD and worked on 3gpp wireless protocols. Thereafter he worked as an IT Analyst in TATA Consultancy Services Ltd.(TCSL) Form July, 2011 to July, 2017. He is now employed as an Assistant Professor in Department of Information Technology, Techno India College of Technology, India. He is an active researcher in the field of Big Data, Data Mining, Audio processing and Audio classification etc. 
Nilanjan Dey was born in Kolkata, India, in 1984. He received his B.Tech. degree inInformation Technology from West Bengal University of Technology in 2005, M.Tech.in Information Technology in 2011 from the same University and Ph.D. in digital image processing in 2015 from Jadavpur University, India.In 2011, he was appointed as an Assistant Professor in the Department of Information Technology at JIS College of Engineering, Kalyani, India followed by Bengal College of Engineering College, Durgapur, India in 2014. He is now employed as an Assistant Professor in Department of Information Technology, Techno India College of Technology, India. He is a visiting fellow of the University of Reading, UK. His research topic is signal processing, machine learning and information security.Dr. Dey is an Associate Editor of IEEE ACCESS and is currently the Editor in-Chief of the International Journal of Ambient Computing and Intelligence. Series Co-editor of Advances in Ubiquitous Sensing Applications for Healthcare (AUSAH), Elsevier and Springer Tracts in Nature-Inspired Computing (STNIC).

Sen / Dutta / Dey Audio Processing and Speech Recognition jetzt bestellen!

Weitere Infos & Material


1;Preface;7
1.1;Objective of the Book;7
1.2;Organization of the Book;7
1.3;Chapter 1: Audio Indexing;8
1.4;Chapter 2: Speech Processing and Recognition System;8
1.5;Chapter 3: Feature Extraction;9
1.6;Chapter 4: Audio Classification;10
2;Contents;11
3;About the Authors;13
4;1 Audio Indexing;15
4.1;1.1 Introduction;15
4.2;1.2 Audio Indexing and Classic Information Retrieval Problem;16
4.3;1.3 Large Vocabulary Continuous Speech Recognition (LVCSR);16
4.3.1;1.3.1 Recognition Errors and Vocabulary Limitations;17
4.3.2;1.3.2 The Out-of-Vocabulary Problem;19
4.3.3;1.3.3 Pros and Cons of LVCSR Speech Analytics;19
4.4;1.4 Phonetic Search;20
4.4.1;1.4.1 Phases of Phonetic Search;21
4.4.2;1.4.2 Pros and Cons of Phonetic Search;22
4.5;1.5 Comparison Between LVCSR and Phonetic Search;22
4.6;1.6 Summary;23
4.7;References;24
5;2 Speech Processing and Recognition System;26
5.1;2.1 Introduction;26
5.2;2.2 Human Speech Production System;27
5.2.1;2.2.1 Speech Generation;27
5.2.2;2.2.2 Speech Perception;28
5.2.3;2.2.3 Voiced and Unvoiced Speech;28
5.2.4;2.2.4 Model of Human Speech;29
5.3;2.3 Automatic Speech Recognition System;30
5.3.1;2.3.1 History of ASR;31
5.3.2;2.3.2 Structure of an ASR System;32
5.3.3;2.3.3 Neural Network and Speech Recognition System;41
5.3.4;2.3.4 Pronunciation Model;46
5.3.5;2.3.5 Language Model;48
5.3.6;2.3.6 Central Decoder;49
5.4;2.4 Summary;52
5.5;References;54
6;3 Feature Extraction;57
6.1;3.1 Introduction;57
6.2;3.2 Basic Audio Features;58
6.2.1;3.2.1 Pitch;58
6.2.2;3.2.2 Timbral Features;59
6.2.3;3.2.3 Rhythmic Features;61
6.2.4;3.2.4 Inharmonicity;61
6.2.5;3.2.5 Autocorrelation;61
6.2.6;3.2.6 Other Features;62
6.2.7;3.2.7 MPEG-7 Features;63
6.3;3.3 Feature Extraction Techniques;64
6.3.1;3.3.1 Linear Prediction Coding (LPC);64
6.3.2;3.3.2 Mel-Frequency Cepstral Coefficient (MFCC);66
6.3.3;3.3.3 Perceptual Linear Prediction (PLP);68
6.3.4;3.3.4 Discrete Wavelet Transform (DWT);70
6.4;3.4 Summary;75
6.5;References;76
7;4 Audio Classification;79
7.1;4.1 Introduction;79
7.2;4.2 Classification Strategies;80
7.2.1;4.2.1 k-Nearest Neighbors (k-NN);80
7.2.2;4.2.2 Naïve Bayes (NB) Classifier;83
7.2.3;4.2.3 Decision Tree and Speech Classification;86
7.2.4;4.2.4 Support Vector Machine (SVM) and Speech Classification;97
7.3;4.3 Neural Network in Speech Classification;99
7.4;4.4 Deep Neural Network in Speech Recognition and Classification;101
7.5;4.5 Summary;101
7.6;References;102
8;5 Conclusion;106



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.