E-Book, Englisch, 275 Seiten
Rebala / Ravi / Churiwala An Introduction to Machine Learning
1. Auflage 2019
ISBN: 978-3-030-15729-6
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark
E-Book, Englisch, 275 Seiten
ISBN: 978-3-030-15729-6
Verlag: Springer Nature Switzerland
Format: PDF
Kopierschutz: 1 - PDF Watermark
Just like electricity, Machine Learning will revolutionize our life in many ways - some of which are not even conceivable today. This book provides a thorough conceptual understanding of Machine Learning techniques and algorithms. Many of the mathematical concepts are explained in an intuitive manner. The book starts with an overview of machine learning and the underlying Mathematical and Statistical concepts before moving onto machine learning topics. It gradually builds up the depth, covering many of the present day machine learning algorithms, ending in Deep Learning and Reinforcement Learning algorithms. The book also covers some of the popular Machine Learning applications. The material in this book is agnostic to any specific programming language or hardware so that readers can try these concepts on whichever platforms they are already familiar with. Offers a comprehensive introduction to Machine Learning, while not assuming any prior knowledge of the topic;
Provides a complete overview of available techniques and algorithms in conceptual terms, covering various application domains of machine learning;
Not tied to any specific software language or hardware implementation.
Gopinath Rebala is Chief Technical Officer at OpsMx, Inc. in San Ramon, California. Ajay Ravi is a Software Engineer at Intel, Inc. in San Jose, California. Sanjay Churiwala is Senior Director at Xilinx, in Hyderabad, India.
Autoren/Hrsg.
Weitere Infos & Material
1;Preface;5
1.1;Reading the Book;6
1.2;Acknowledgments;7
2;Contents;8
3;List of Figures;16
4;List of Tables;19
5;Chapter 1: Machine Learning Definition and Basics;21
5.1;1.1 Introduction;21
5.1.1;1.1.1 Resurgence of ML;22
5.1.2;1.1.2 Relation with Artificial Intelligence (AI);23
5.1.3;1.1.3 Machine Learning Problems;24
5.2;1.2 Matrices;24
5.2.1;1.2.1 Vector and Tensors;25
5.2.2;1.2.2 Matrix Addition (or Subtraction);25
5.2.3;1.2.3 Matrix Transpose;25
5.2.4;1.2.4 Matrix Multiplication;26
5.2.4.1;1.2.4.1 Multiplying with a Scalar;26
5.2.4.2;1.2.4.2 Multiplying with Another Matrix;26
5.2.4.3;1.2.4.3 Multiplying with a Vector;27
5.2.5;1.2.5 Identity Matrix;27
5.2.6;1.2.6 Matrix Inversion;27
5.2.7;1.2.7 Solving Equations Using Matrices;28
5.3;1.3 Numerical Methods;29
5.4;1.4 Probability and Statistics;30
5.4.1;1.4.1 Sampling the Distribution;31
5.4.2;1.4.2 Random Variables;31
5.4.3;1.4.3 Expectation;31
5.4.4;1.4.4 Conditional Probability and Distribution;32
5.4.5;1.4.5 Maximum Likelihood;32
5.5;1.5 Linear Algebra;33
5.6;1.6 Differential Calculus;34
5.6.1;1.6.1 Functions;34
5.6.2;1.6.2 Slope;34
5.7;1.7 Computer Architecture;35
5.8;1.8 Next Steps;36
6;Chapter 2: Learning Models;38
6.1;2.1 Supervised Learning;38
6.1.1;2.1.1 Classification Problem;39
6.1.2;2.1.2 Regression Problem;39
6.2;2.2 Unsupervised Learning;40
6.3;2.3 Semi-supervised Learning;41
6.4;2.4 Reinforcement Learning;41
7;Chapter 3: Regressions;43
7.1;3.1 Introduction;43
7.2;3.2 The Model;44
7.3;3.3 Problem Formulation;44
7.4;3.4 Linear Regression;45
7.4.1;3.4.1 Normal Method;46
7.4.2;3.4.2 Gradient Descent Method;48
7.4.2.1;3.4.2.1 Determine the Slope at Any Given Point;49
7.4.2.2;3.4.2.2 Initial Value;49
7.4.2.3;3.4.2.3 Correction;50
7.4.2.4;3.4.2.4 Learning Rate;50
7.4.2.5;3.4.2.5 Convergence;52
7.4.2.6;3.4.2.6 Alternate Method for Computing Slope;53
7.4.2.7;3.4.2.7 Putting Gradient Descent in Practice;53
7.4.3;3.4.3 Normal Equation Method vs Gradient Descent Method;54
7.5;3.5 Logistic Regression;54
7.5.1;3.5.1 Sigmoid Function;55
7.5.2;3.5.2 Cost Function;56
7.5.3;3.5.3 Gradient Descent;56
7.6;3.6 Next Steps;57
7.7;3.7 Key Takeaways;58
8;Chapter 4: Improving Further;59
8.1;4.1 Nonlinear Contribution;59
8.2;4.2 Feature Scaling;60
8.3;4.3 Gradient Descent Algorithm Variations;61
8.3.1;4.3.1 Cost Contour;61
8.3.2;4.3.2 Stochastic Gradient Descent;62
8.3.2.1;4.3.2.1 Convergence for Stochastic Gradient Descent;64
8.3.3;4.3.3 Mini Batch Gradient Descent;65
8.3.4;4.3.4 Map Reduce and Parallelism;66
8.3.5;4.3.5 Basic Theme of Algorithm Variations;67
8.4;4.4 Regularization;67
8.4.1;4.4.1 Regularization for Normal Equation;70
8.4.2;4.4.2 Regularization for Logistic Regression;70
8.4.3;4.4.3 Determining Appropriate ?;70
8.4.3.1;4.4.3.1 Cross Validation;71
8.4.3.2;4.4.3.2 K-Fold Cross Validation;71
8.4.4;4.4.4 Comparing Hypothesis;71
8.5;4.5 Multi-class Classifications;72
8.5.1;4.5.1 One-vs-All Classification;72
8.5.2;4.5.2 SoftMax;73
8.5.2.1;4.5.2.1 Basic Approach for SoftMax;73
8.5.2.2;4.5.2.2 Loss Function;74
8.6;4.6 Key Takeaways and Next Steps;74
9;Chapter 5: Classification;75
9.1;5.1 Decision Boundary;75
9.1.1;5.1.1 Nonlinear Decision Boundary;76
9.2;5.2 Skewed Class;78
9.2.1;5.2.1 Optimizing Precision vs Recall;79
9.2.2;5.2.2 Single Metric;79
9.3;5.3 Naïve Bayes´ Algorithm;80
9.4;5.4 Support Vector Machines;81
9.4.1;5.4.1 Kernel Selection;84
10;Chapter 6: Clustering;85
10.1;6.1 K-Means;85
10.1.1;6.1.1 Basic Algorithm;86
10.1.2;6.1.2 Distance Calculation;86
10.1.3;6.1.3 Algorithm Pseudo Code;87
10.1.4;6.1.4 Cost Function;87
10.1.5;6.1.5 Choice of Initial Random Centers;88
10.1.6;6.1.6 Number of Clusters;89
10.2;6.2 K-Nearest Neighbor (KNN);90
10.2.1;6.2.1 Weight Consideration;91
10.2.2;6.2.2 Feature Scaling;91
10.2.3;6.2.3 Limitations;91
10.2.4;6.2.4 Finding the Nearest Neighbors;92
10.3;6.3 Next Steps;94
11;Chapter 7: Random Forests;95
11.1;7.1 Decision Tree;95
11.2;7.2 Information Gain;97
11.3;7.3 Gini Impurity Criterion;104
11.4;7.4 Disadvantages of Decision Trees;107
11.5;7.5 Random Forests;107
11.5.1;7.5.1 Data Bagging;108
11.5.2;7.5.2 Feature Bagging;108
11.5.3;7.5.3 Cross Validation in Random Forests;108
11.5.4;7.5.4 Prediction;109
11.6;7.6 Variable Importance;109
11.7;7.7 Proximities;110
11.7.1;7.7.1 Outliers;110
11.7.2;7.7.2 Prototypes;111
11.8;7.8 Disadvantages of Random Forests;111
11.9;7.9 Next Steps;112
12;Chapter 8: Testing the Algorithm and the Network;113
12.1;8.1 Test Set;113
12.2;8.2 Overfit;114
12.3;8.3 Underfit;114
12.4;8.4 Determining the Number of Degrees;114
12.5;8.5 Determining ?;115
12.6;8.6 Increasing Data Count;116
12.6.1;8.6.1 High Bias Case;116
12.6.2;8.6.2 High Variance Case;117
12.7;8.7 The Underlying Mathematics (Optional);117
12.8;8.8 Utilizing the Bias vs Variance Information;119
12.9;8.9 Derived Data;119
12.10;8.10 Approach;120
12.11;8.11 Test Data;120
13;Chapter 9: (Artificial) Neural Networks;121
13.1;9.1 Logistic Regression Extended to Form Neural Network;121
13.2;9.2 Neural Network as Oversimplified Brain;123
13.3;9.3 Visualizing Neural Network Equations;124
13.4;9.4 Matrix Formulation of Neural Network;125
13.5;9.5 Neural Network Representation;126
13.6;9.6 Starting to Design a Neural Network;127
13.7;9.7 Training the Network;128
13.7.1;9.7.1 Chain Rule;129
13.7.2;9.7.2 Components of Gradient Computation;130
13.7.3;9.7.3 Gradient Computation Through Backpropagation;132
13.7.4;9.7.4 Updating Weights;133
13.8;9.8 Vectorization;134
13.9;9.9 Controlling Computations;134
13.10;9.10 Next Steps;134
14;Chapter 10: Natural Language Processing;135
14.1;10.1 Complexity of NLP;135
14.2;10.2 Algorithms;137
14.2.1;10.2.1 Rule-Based Processing;137
14.2.2;10.2.2 Tokenizer;137
14.2.3;10.2.3 Named Entity Recognizers;138
14.2.4;10.2.4 Term Frequency-Inverse Document Frequency (tf-idf);139
14.2.5;10.2.5 Word Embedding;140
14.2.6;10.2.6 Word2vec;141
14.2.6.1;10.2.6.1 Continuous Bag of Words;141
14.2.6.2;10.2.6.2 Skip-Gram Model;142
15;Chapter 11: Deep Learning;144
15.1;11.1 Recurrent Neural Networks;144
15.1.1;11.1.1 Representation of RNN;146
15.1.2;11.1.2 Backpropagation in RNN;149
15.1.3;11.1.3 Vanishing Gradients;150
15.2;11.2 LSTM;151
15.3;11.3 GRU;153
15.4;11.4 Self-Organizing Maps;155
15.4.1;11.4.1 Representation and Training of SOM;155
16;Chapter 12: Principal Component Analysis;158
16.1;12.1 Applications of PCA;159
16.1.1;12.1.1 Example 1;159
16.1.2;12.1.2 Example 2;159
16.2;12.2 Computing PCA;160
16.2.1;12.2.1 Data Representation;160
16.2.2;12.2.2 Covariance Matrix;160
16.2.3;12.2.3 Diagonal Matrix;161
16.2.4;12.2.4 Eigenvector;161
16.2.5;12.2.5 Symmetric Matrix;161
16.2.6;12.2.6 Deriving Principal Components;161
16.2.7;12.2.7 Singular Value Decomposition (SVD);162
16.3;12.3 Computing PCA;162
16.3.1;12.3.1 Data Characteristics;162
16.3.2;12.3.2 Data Preprocessing;163
16.3.3;12.3.3 Selecting Principal Components;163
16.4;12.4 PCA Applications;165
16.4.1;12.4.1 Image Compression;165
16.4.2;12.4.2 Data Visualization;166
16.5;12.5 Pitfalls of PCA Application;168
16.5.1;12.5.1 Overfitting;168
16.5.2;12.5.2 Model Generation;169
16.5.3;12.5.3 Model Interpretation;169
17;Chapter 13: Anomaly Detection;170
17.1;13.1 Anomaly vs Classification;171
17.2;13.2 Model;171
17.2.1;13.2.1 Distribution Density;172
17.2.2;13.2.2 Estimating Distribution Parameters;173
17.2.3;13.2.3 Metric Value;173
17.2.4;13.2.4 Finding;174
17.2.5;13.2.5 Validating and Tuning the Model;174
17.3;13.3 Multivariate Gaussian Distribution;175
17.3.1;13.3.1 Determining Feature Mean;176
17.3.2;13.3.2 Determining Covariance;176
17.3.3;13.3.3 Computing and Applying the Metric;177
17.4;13.4 Anomalies in Time Series;177
17.4.1;13.4.1 Time Series Decomposition;178
17.4.2;13.4.2 Time Series Anomaly Types;178
17.4.3;13.4.3 Anomaly Detection in Time Series;180
17.4.3.1;13.4.3.1 ARIMA;181
17.4.3.2;13.4.3.2 Machine Learning Models;184
18;Chapter 14: Recommender Systems;185
18.1;14.1 Features Known;186
18.1.1;14.1.1 User´s Affinity Toward Each Feature;186
18.2;14.2 User´s Preferences Known;187
18.2.1;14.2.1 Characterizing Features;188
18.3;14.3 Features and User Preferences Both Unknown;189
18.3.1;14.3.1 Collaborative Filtering;189
18.3.1.1;14.3.1.1 Basic Assumptions;189
18.3.1.2;14.3.1.2 Parameters Under Consideration;189
18.3.1.3;14.3.1.3 Initialize;190
18.3.1.4;14.3.1.4 Iterate;190
18.3.1.5;14.3.1.5 Cost Function;190
18.3.1.6;14.3.1.6 Gradient Descent;191
18.3.2;14.3.2 Predicting and Recommending;191
18.4;14.4 New User;192
18.4.1;14.4.1 Shortcomings of the Current Algorithm;193
18.4.2;14.4.2 Mean Normalization;194
18.5;14.5 Tracking Changes in Preferences;194
19;Chapter 15: Convolution;196
19.1;15.1 Convolution Explained;196
19.2;15.2 Object Identification Example;198
19.2.1;15.2.1 Exact Shape Known;198
19.2.2;15.2.2 Exact Shape Not Known;199
19.2.3;15.2.3 Breaking Down Further;199
19.2.4;15.2.4 Unanswered Questions;200
19.3;15.3 Image Convolution;200
19.4;15.4 Preprocessing;202
19.5;15.5 Post-Processing;203
19.6;15.6 Stride;204
19.7;15.7 CNN;205
19.8;15.8 Matrix Operation;206
19.9;15.9 Refining the Filters;207
19.10;15.10 Pooling as Neural Network;208
19.11;15.11 Character Recognition and Road Signs;208
19.12;15.12 ADAS and Convolution;208
20;Chapter 16: Components of Reinforcement Learning;210
20.1;16.1 Key Participants of a Reinforcement Learning System;210
20.1.1;16.1.1 The Agent;210
20.1.1.1;16.1.1.1 Agent´s Objective;212
20.1.1.2;16.1.1.2 Rewards as Feedback for Agent;212
20.1.2;16.1.2 The Environment;213
20.1.2.1;16.1.2.1 Environment State Space;214
20.1.3;16.1.3 Interaction Between Agent and Environment;214
20.2;16.2 Environment State Transitions and Actions;215
20.2.1;16.2.1 Deterministic Environment;215
20.2.2;16.2.2 Stochastic Environment;216
20.2.3;16.2.3 Markov States and MDP;217
20.3;16.3 Agent´s Objective;218
20.4;16.4 Agent´s Behavior;219
20.5;16.5 Graphical Notation for a Trajectory;220
20.6;16.6 Value Function;220
20.6.1;16.6.1 State-Value Function;221
20.6.2;16.6.2 Action-Value Function;221
20.7;16.7 Methods for Finding Optimal Policies;223
20.7.1;16.7.1 Agent´s Awareness of MDP;223
20.7.1.1;16.7.1.1 MDP Known;223
20.7.1.2;16.7.1.2 MDP Unknown;224
20.7.1.3;16.7.1.3 MDP Partially Known;224
20.7.2;16.7.2 Model-Based and Model-Free Reinforcement Learning;225
20.7.3;16.7.3 On-Policy and Off-Policy Reinforcement Learning;225
20.8;16.8 Policy Iteration Method for Optimal Policy;225
20.8.1;16.8.1 Computing Q-function for a Given Policy;226
20.8.2;16.8.2 Policy Iteration;226
21;Chapter 17: Reinforcement Learning Algorithms;227
21.1;17.1 Monte Carlo Learning;227
21.1.1;17.1.1 State Value Estimation;228
21.1.2;17.1.2 Action Value Estimation;229
21.2;17.2 Estimating Action Values with TD Learning;229
21.3;17.3 Exploration vs Exploitation Trade-Off;231
21.3.1;17.3.1 -greedy Policy;231
21.4;17.4 Q-learning;232
21.5;17.5 Scaling Through Function Approximation;233
21.5.1;17.5.1 Approximating the Q-function in Q-learning;234
21.6;17.6 Policy-Based Methods;234
21.6.1;17.6.1 Advantages of Policy Gradient Methods;235
21.6.2;17.6.2 Parameterized Policy;235
21.6.3;17.6.3 Training the Model;236
21.6.4;17.6.4 Monte Carlo Gradient Methods;237
21.6.5;17.6.5 Actor-Critic Methods;237
21.6.6;17.6.6 Reducing Variability in Gradient Methods;238
21.7;17.7 Simulation-Based Learning;239
21.8;17.8 Monte Carlo Tree Search (MCTS);241
21.8.1;17.8.1 Search Tree;241
21.8.2;17.8.2 Monte Carlo Search Tree;242
21.8.2.1;17.8.2.1 Trajectory Values;243
21.8.2.2;17.8.2.2 Backup Procedure;244
21.8.3;17.8.3 MCTS Algorithm;245
21.8.3.1;17.8.3.1 Selection Phase (aka Tree Phase);245
21.8.3.2;17.8.3.2 Expansion Phase;245
21.8.3.3;17.8.3.3 Rollout Phase;246
21.8.3.4;17.8.3.4 Backup Phase (aka Back Propagation Phase);246
21.8.3.5;17.8.3.5 Tree Policy;246
21.8.4;17.8.4 Pseudo Code for MCTS Algorithm;247
21.8.5;17.8.5 Parallel MCTS Algorithms;249
21.9;17.9 MCTS Tree Values for Two-Player Games;249
21.10;17.10 Alpha Zero;250
21.10.1;17.10.1 Overview;250
21.10.1.1;17.10.1.1 Value Function and Policy Network;250
21.10.1.2;17.10.1.2 MCTS Search;251
21.10.1.3;17.10.1.3 Self-Play and Training Data;251
21.10.1.4;17.10.1.4 Iterative Improvement Loop;251
21.10.2;17.10.2 Aspects of Alpha Zero;252
21.10.2.1;17.10.2.1 Supervised Training;252
21.10.2.2;17.10.2.2 Loss Function;253
21.10.3;17.10.3 MCTS Search;253
21.10.3.1;17.10.3.1 Node Value;253
21.10.3.2;17.10.3.2 Selection Phase;254
21.10.3.3;17.10.3.3 Expansion Phase;254
21.10.3.4;17.10.3.4 Evaluation Phase (Replaces the Rollout Phase);255
21.10.3.5;17.10.3.5 Backup Phase;255
21.10.3.6;17.10.3.6 Parallel Execution;255
22;Chapter 18: Designing a Machine Learning System;256
22.1;18.1 Pipeline Systems;256
22.1.1;18.1.1 Ceiling Analysis;257
22.2;18.2 Data Quality;257
22.2.1;18.2.1 Unstructured Data;259
22.2.2;18.2.2 Getting Data;259
22.3;18.3 Improvisations over Gradient Descent;260
22.3.1;18.3.1 Momentum;260
22.3.2;18.3.2 RMSProp;261
22.3.3;18.3.3 ADAM (Adaptive Moment Estimation);262
22.4;18.4 Software Stacks;263
22.4.1;18.4.1 TensorFlow;263
22.4.2;18.4.2 MXNet;264
22.4.3;18.4.3 pyTorch;265
22.4.4;18.4.4 The Microsoft Cognitive Toolkit;265
22.4.5;18.4.5 Keras;266
22.5;18.5 Choice of Hardware;266
22.5.1;18.5.1 Traditional Computer Systems;266
22.5.2;18.5.2 GPU;267
22.5.3;18.5.3 FPGAs;268
22.5.4;18.5.4 TPUs;268
23;Bibliography;270
24;Index;271




