From Martingales to Metaheuristics
Buch, Englisch, 592 Seiten, Format (B × H): 152 mm x 229 mm, Gewicht: 943 g
ISBN: 978-1-119-71674-7
Verlag: Wiley
Informatics and Machine Learning
Discover a thorough exploration of how to use computational, algorithmic, statistical, and informatics methods to analyze digital data
Informatics and Machine Learning: From Martingales to Metaheuristics delivers an interdisciplinary presentation on how analyze any data captured in digital form. The book describes how readers can conduct analyses of text, general sequential data, experimental observations over time, stock market and econometric histories, or symbolic data, like genomes. It contains large amounts of sample code to demonstrate the concepts contained within and assist with various levels of project work.
The book offers a complete presentation of the mathematical underpinnings of a wide variety of forms of data analysis and provides extensive examples of programming implementations. It is based on two decades worth of the distinguished author’s teaching and industry experience. - A thorough introduction to probabilistic reasoning and bioinformatics, including Python shell scripting to obtain data counts, frequencies, probabilities, and anomalous statistics, or use with Bayes’ rule
- An exploration of information entropy and statistical measures, including Shannon entropy, relative entropy, maximum entropy (maxent), and mutual information
- A practical discussion of ad hoc, ab initio, and bootstrap signal acquisition methods, with examples from genome analytics and signal analytics
Perfect for undergraduate and graduate students in machine learning and data analytics programs, Informatics and Machine Learning: From Martingales to Metaheuristics will also earn a place in the libraries of mathematicians, engineers, computer scientists, and life scientists with an interest in those subjects.
Autoren/Hrsg.
Fachgebiete
- Medizin | Veterinärmedizin Medizin | Public Health | Pharmazie | Zahnmedizin Medizin, Gesundheitswesen Medizinische Mathematik & Informatik
- Naturwissenschaften Biowissenschaften Angewandte Biologie Bioinformatik
- Medizin | Veterinärmedizin Medizin | Public Health | Pharmazie | Zahnmedizin Medizinische Fachgebiete Bildgebende Verfahren, Nuklearmedizin, Strahlentherapie Radiologie, Bildgebende Verfahren
- Mathematik | Informatik Mathematik Stochastik Mathematische Statistik
- Interdisziplinäres Wissenschaften Wissenschaften: Forschung und Information Datenanalyse, Datenverarbeitung
Weitere Infos & Material
Preface xv
1 Introduction 1
1.1 Data Science: Statistics, Probability, Calculus … Python (or Perl) and Linux 2
1.2 Informatics and Data Analytics 3
1.3 FSA-Based Signal Acquisition and Bioinformatics 4
1.4 Feature Extraction and Language Analytics 7
1.5 Feature Extraction and Gene Structure Identification 8
1.5.1 HMMs for Analysis of Information Encoding Molecules 11
1.5.2 HMMs for Cheminformatics and Generic Signal Analysis 11
1.6 Theoretical Foundations for Learning 13
1.7 Classification and Clustering 13
1.8 Search 14
1.9 Stochastic Sequential Analysis (SSA) Protocol (Deep Learning Without NNs) 15
1.9.1 Stochastic Carrier Wave (SCW) Analysis – Nanoscope Signal Analysis 18
1.9.2 Nanoscope Cheminformatics – A Case Study for Device “Smartening” 19
1.10 Deep Learning using Neural Nets 20
1.11 Mathematical Specifics and Computational Implementations 21
2 Probabilistic Reasoning and Bioinformatics 23
2.1 Python Shell Scripting 23
2.1.1 Sample Size Complications 33
2.2 Counting, the Enumeration Problem, and Statistics 34
2.3 From Counts to Frequencies to Probabilities 35
2.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics 35
2.5 Statistics, Conditional Probability, and Bayes’ Rule 37
2.5.1 The Calculus of Conditional Probabilities: The Cox Derivation 37
2.5.2 Bayes’ Rule 38
2.5.3 Estimation Based on Maximal Conditional Probabilities 38
2.6 Emergent Distributions and Series 39
2.6.1 The Law of Large Numbers (LLN) 39
2.6.2 Distributions 39
2.6.3 Series 42
2.7 Exercises 42
3 Information Entropy and Statistical Measures 47
3.1 Shannon Entropy, Relative Entropy, Maxent, Mutual Information 48
3.1.1 The Khinchin Derivation 49
3.1.2 Maximum Entropy Principle 49
3.1.3 Relative Entropy and Its Uniqueness 51
3.1.4 Mutual Information 51
3.1.5 Information Measures Recap 52
3.2 Codon Discovery from Mutual Information Anomaly 58
3.3 ORF Discovery from Long-Tail Distribution Anomaly 66
3.3.1 Ab initio Learning with smORF’s, Holistic Modeling, and Bootstrap Learning 69
3.4 Sequential Processes and Markov Models 72
3.4.1 Markov Chains 73
3.5 Exercises 75
4 Ad Hoc, Ab Initio, and Bootstrap Signal Acquisition Methods 77
4.1 Signal Acquisition, or Scanning, at Linear Order Time-Complexity 77
4.2 Genome Analytics: The Gene-Finder 80
4.3 Objective Performance Evaluation: Sensitivity and Specificity 93
4.4 Signal Analytics: The Time-Domain Finite State Automaton (tFSA) 93
4.4.1 tFSA Spike Detector 95
4.4.2 tFSA-Based Channel Signal Acquisition Methods with Stable Baseline 98
4.4.3 tFSA-Based Channel Signal Acquisition Methods Without Stable Baseline 103
4.5 Signal Statistics (Fast): Mean, Variance, and Boxcar Filter 107
4.5.1 Efficient Implementations for Statistical Tools (O(L)) 109
4.6 Signal Spectrum: Nyquist Criterion, Gabor Limit, Power Spectrum 110
4.6.1 Nyquist Sampling Theorem 110
4.6.2 Fourier Transforms, and Other Classic Transforms 110
4.6.3 Power Spectral Density 111
4.6.4 Power-Spectrum-Based Feature Extraction 111
4.6.5 Cross-Power Spectral Density 112
4.6.6 AM/FM/PM Communications Protocol 112
4.7 Exercises 112
5 Text Analytics 125
5.1 Words 125
5.1.1 Text Acquisition: Text Scraping and Associative Memory 125
5.1.2 Word Frequency Analysis: Machiavelli’s Polysemy on Fortuna and Virtu 130
5.1.3 Word Frequency Analysis: Coleridge’s Hidden Polysemy on Logos 139
5.1.4 Sentiment Analysis 143
5.2 Phrases – Short (Three Words) 145
5.2.1 Shakespearean Insult Generation – Phrase Generation 147
5.3 Phrases – Long (A Line or Sentence) 150
5.3.1 Iambic Phrase Analysis: Shakespeare 150
5.3.2 Natural Language Processing 152
5.3.3 Sentence and Story Generation: Tarot 152
5.4 Exercises 153
6 Analysis of Sequential Data Using HMMs 155
6.1 Hidden Markov Models (HMMs) 155
6.1.1 Background and Role in Stochastic Sequential Analysis (SSA) 155
6.1.2 When to Use a Hidden Markov Model (HMM)? 160
6.1.3 Hidden Markov Models (HMMs) – Standard Formulation and Terms 161
6.2 Graphical Models for Markov Models and Hidden Markov Models 162
6.2.1 Hidden Markov Models 162
6.2.2 Viterbi Path 163
6.2.3 Forward and Backward Probabilities 164
6.2.4 HMM: Maximum Likelihood discrimination 165
6.2.5 Expectation/Maximization (Baum–Welch) 166
6.3 Standard HMM Weaknesses and their GHMM Fixes 168
6.4 Generalized HMMs (GHMMs –“Gems”): Minor Viterbi Variants 171
6.4.1 The Generic HMM 171
6.4.2 pMM/SVM 171
6.4.3 EM and Feature Extraction via EVA Projection 172
6.4.4 Feature Extraction via Data Absorption (a.k.a. Emission Inversion) 174
6.4.5 Modified AdaBoost for Feature Selection and Data Fusion 176
6.5 HMM Implementation for Viterbi (in C and Perl) 179
6.6 Exercises 206
7 Generalized HMMs (GHMMs): Major Viterbi Variants 207
7.1 GHMMs: Maximal Clique for Viterbi and Baum–Welch 207
7.2 GHMMs: Full Duration Model 216
7.2.1 HMM with Duration (HMMD) 216
7.2.2 Hidden Semi-Markov Models (HSMM) with sid-information 220
7.2.3 HMM with Binned Duration (HMMBD) 224
7.3 GHMMs: Linear Memory Baum–Welch Algorithm 228
7.4 GHMMs: Distributable Viterbi and Baum–Welch Algorithms 230
7.4.1 Distributed HMM processing via “Viterbi-overlap-chunking” with GPU speedup 230
7.4.2 Relative Entropy and Viterbi Scoring 231
7.5 Martingales and the Feasibility of Statistical Learning (further details in Appendix) 232
7.6 Exercises 234
8 Neuromanifolds and the Uniqueness of Relative Entropy 235
<