Swamynathan | Mastering Machine Learning with Python in Six Steps | E-Book | www.sack.de
E-Book

E-Book, Englisch, 469 Seiten

Swamynathan Mastering Machine Learning with Python in Six Steps

A Practical Implementation Guide to Predictive Data Analytics Using Python
2. ed
ISBN: 978-1-4842-4947-5
Verlag: Apress
Format: PDF
Kopierschutz: 1 - PDF Watermark

A Practical Implementation Guide to Predictive Data Analytics Using Python

E-Book, Englisch, 469 Seiten

ISBN: 978-1-4842-4947-5
Verlag: Apress
Format: PDF
Kopierschutz: 1 - PDF Watermark



Explore fundamental to advanced Python 3 topics in six steps, all designed to make you a worthy practitioner. This updated version's approach is based on the 'six degrees of separation' theory, which states that everyone and everything is a maximum of six steps away and presents each topic in two parts: theoretical concepts and practical implementation using suitable Python 3 packages.You'll start with the fundamentals of Python 3 programming language, machine learning history, evolution, and the system development frameworks. Key data mining/analysis concepts, such as exploratory analysis, feature dimension reduction, regressions, time series forecasting and their efficient implementation in Scikit-learn are covered as well. You'll also learn commonly used model diagnostic and tuning techniques. These include optimal probability cutoff point for class creation, variance, bias, bagging, boosting, ensemble voting, grid search, random search, Bayesian optimization, and the noise reduction technique for IoT data.  Finally, you'll review advanced text mining techniques, recommender systems, neural networks, deep learning, reinforcement learning techniques and their implementation. All the code presented in the book will be available in the form of iPython notebooks to enable you to try out these examples and extend them to your advantage.What You'll Learn
Understand machine learning development and frameworksAssess model diagnosis and tuning in machine learningExamine text mining, natuarl language processing (NLP), and recommender systemsReview reinforcement learning and CNNWho This Book Is ForPython developers, data engineers, and machine learning engineers looking to expand their knowledge or career into machine learning area.



Manohar Swamynathan is a data science practitioner and an avid programmer, with over 14+ years of experience in various data science related areas that include data warehousing, Business Intelligence (BI), analytical tool development, ad-hoc analysis, predictive modeling, data science product development, consulting, formulating strategy and executing analytics program. He's had a career covering life cycle of data across different domains such as US mortgage banking, retail/e-commerce, insurance, and industrial IoT. He has a bachelor's degree with a specialization in physics, mathematics, computers, and a master's degree in project management. He's currently living in Bengaluru, the silicon valley of India. 

Swamynathan Mastering Machine Learning with Python in Six Steps jetzt bestellen!

Autoren/Hrsg.


Weitere Infos & Material


1;Table of Contents;4
2;About the Author;10
3;About the Technical Reviewer;11
4;Acknowledgments;12
5;Introduction;13
6;Chapter 1: Step 1: Getting Started in Python 3;16
6.1;The Best Things in Life Are Free;16
6.2;The Rising Star;18
6.3;Choosing Python 2.x or Python 3.x;18
6.3.1;Windows;20
6.3.2;OSX;20
6.3.2.1;Graphical Installer;20
6.3.2.2;Command Line Installer;20
6.3.3;Linux;21
6.3.4;From Official Website;21
6.3.5;Running Python;21
6.4;Key Concepts;22
6.4.1;Python Identifiers;22
6.4.2;Keywords;22
6.4.3;My First Python Program;23
6.4.4;Code Blocks;23
6.4.4.1;Indentations;23
6.4.4.2;Suites;24
6.4.5;Basic Object Types;25
6.4.6;When to Use List, Tuple, Set, or Dictionary;28
6.4.7;Comments in Python;29
6.4.8;Multiline Statements;29
6.4.9;Multiple Statements on a Single Line;30
6.4.10;Basic Operators;30
6.4.10.1;Arithmetic Operators;31
6.4.10.2;Comparison or Relational Operators;32
6.4.10.3;Assignment Operators;34
6.4.10.4;Bitwise Operators;35
6.4.10.5;Logical Operators;37
6.4.10.6;Membership Operators;38
6.4.10.7;Identity Operators;39
6.4.11;Control Structures;39
6.4.11.1;Selections;40
6.4.11.2;Iterations;41
6.4.12;Lists;44
6.4.13;Tuples;48
6.4.14;Sets;52
6.4.14.1;Changing Sets in Python;56
6.4.14.2;Removing Items from Sets;57
6.4.14.3;Set Operations;57
6.4.14.4;Set Unions;57
6.4.14.5;Set Intersections;58
6.4.14.6;Set Difference;58
6.4.14.7;Set Symmetric Difference;59
6.4.14.8;Basic Operations;59
6.4.15;Dictionary;60
6.4.16;User-Defined Functions;66
6.4.16.1;Defining a Function;66
6.4.16.2;The Scope of Variables;68
6.4.16.3;Default Argument;69
6.4.16.4;Variable Length Arguments;69
6.4.17;Modules;70
6.4.18;File Input/Output;72
6.4.19;Opening a File;73
6.4.20;Exception Handling;74
6.5;Summary;79
7;Chapter 2: Step 2: Introduction to Machine Learning;80
7.1;History and Evolution;81
7.2;Artificial Intelligence Evolution;85
7.2.1;Different Forms;86
7.2.1.1;Statistics;87
7.2.1.1.1;Frequentist;88
7.2.1.1.2;Bayesian;88
7.2.1.1.3;Regression;89
7.2.1.2;Data Mining;90
7.2.1.3;Data Analytics;91
7.2.1.3.1;Descriptive Analytics;92
7.2.1.3.2;Diagnostic Analytics;93
7.2.1.3.3;Predictive Analytics;94
7.2.1.3.4;Prescriptive Analytics;94
7.2.1.4;Data Science;95
7.2.1.5;Statistics vs. Data Mining vs. Data Analytics vs. Data Science;97
7.3;Machine Learning Categories;97
7.3.1;Supervised Learning;98
7.3.2;Unsupervised Learning;99
7.3.3;Reinforcement Learning;99
7.4;Frameworks for Building ML Systems;100
7.4.1;Knowledge Discovery in Databases;101
7.4.1.1;Selection;101
7.4.1.2;Preprocessing;102
7.4.1.3;Transformation;102
7.4.1.4;Data Mining;103
7.4.1.5;Interpretation / Evaluation;103
7.4.2;Cross-Industry Standard Process for Data Mining;103
7.4.2.1;Phase 1: Business Understanding;104
7.4.2.2;Phase 2: Data Understanding;104
7.4.2.3;Phase 3: Data Preparation;105
7.4.2.4;Phase 4: Modeling;105
7.4.2.5;Phase 5: Evaluation;105
7.4.2.6;Phase 6: Deployment;105
7.5;SEMMA (Sample, Explore, Modify, Model, Assess);106
7.5.1;Sample;106
7.5.2;Explore;106
7.5.3;Modify;106
7.5.4;Model;106
7.5.5;Assess;107
7.6;Machine Learning Python Packages;108
7.6.1;Data Analysis Packages;109
7.6.1.1;NumPy;110
7.6.1.1.1;Array;110
7.6.1.1.2;Creating NumPy Array;111
7.6.1.1.3;Data Types;113
7.6.1.1.4;Array Indexing;114
7.6.1.1.5;Field Access;114
7.6.1.1.6;Basic Slicing;115
7.6.1.1.7;Advanced Indexing;118
7.6.1.1.8;Array Math;119
7.6.1.1.9;Broadcasting;123
7.6.1.2;Pandas;126
7.6.1.2.1;Data Structures;126
7.6.1.2.1.1;Series;126
7.6.1.2.1.2;DataFrame;127
7.6.1.2.2;Reading and Writing Data;127
7.6.1.2.3;Basic Statistics Summary;128
7.6.1.2.4;Viewing Data;129
7.6.1.2.5;Basic Operations;131
7.6.1.2.6;Merge/Join;133
7.6.1.2.7;Join;135
7.6.1.2.8;Grouping;137
7.6.1.2.9;Pivot Tables;138
7.6.1.3;Matplotlib;139
7.6.1.4;Using Global Functions;139
7.6.1.4.1;Customizing Labels;141
7.6.1.5;Object-Oriented;142
7.6.1.5.1;Line Plots Using ax.plot();143
7.6.1.5.2;Multiple Lines on the Same Axis;144
7.6.1.5.3;Multiple Lines on Different Axis;145
7.6.1.5.4;Control the Line Style and Marker Style;146
7.6.1.5.5;Line Style Reference;147
7.6.1.5.6;Marker Reference;148
7.6.1.5.7;Colormaps Reference;149
7.6.1.5.8;Bar Plots Using ax.bar();149
7.6.1.5.9;Horizontal Bar Charts Using ax.barh();150
7.6.1.5.10;Side by Side Bar Chart;152
7.6.1.5.11;Stacked Bar Example Code;153
7.6.1.5.12;Pie Chart Using ax.pie();154
7.6.1.5.13;Example Code for Grid Creation;155
7.6.1.5.14;Plotting Defaults;156
7.6.2;Machine Learning Core Libraries;157
7.7;Summary;158
8;Chapter 3: Step 3: Fundamentals of Machine Learning;159
8.1;Machine Learning Perspective of Data;159
8.1.1;Scales of Measurement;160
8.1.1.1;Nominal Scale of Measurement;160
8.1.1.2;Ordinal Scale of Measurement;161
8.1.1.3;Interval Scale of Measurement;161
8.1.1.4;Ratio Scale of Measurement;162
8.2;Feature Engineering;163
8.2.1;Dealing with Missing Data;164
8.2.2;Handling Categorical Data;164
8.2.3;Normalizing Data;166
8.2.4;Feature Construction or Generation;168
8.2.4.1;Exploratory Data Analysis;169
8.2.4.2;Univariate Analysis;170
8.2.4.3;Multivariate Analysis;172
8.2.4.3.1;Correlation Matrix;173
8.2.4.3.2;Pair Plot;174
8.2.4.3.3;Findings from EDA;175
8.3;Supervised Learning–Regression;177
8.3.1;Correlation and Causation;179
8.3.2;Fitting a Slope;180
8.3.3;How Good Is Your Model?;182
8.3.3.1;R-Squared for Goodness of fit;182
8.3.3.2;Root Mean Squared Error;184
8.3.3.3;Mean Absolute Error;184
8.3.3.4;Outliers;185
8.3.4;Polynomial Regression;187
8.3.5;Multivariate Regression;193
8.3.5.1;Multicollinearity and Variation Inflation Factor;194
8.3.5.2;Interpreting the Ordinary Least Squares (OLS) Regression Results;199
8.3.5.3;Regression Diagnostics;204
8.3.5.3.1;Outliers;204
8.3.5.3.2;Homoscedasticity and Normality;205
8.3.5.3.3;Overfitting and Underfitting;208
8.3.6;Regularization;208
8.3.7;Nonlinear Regression;212
8.4;Supervised Learning–Classification;213
8.4.1;Logistic Regression;214
8.4.2;Evaluating a Classification Model Performance;219
8.4.3;ROC Curve;221
8.4.4;Fitting Line;222
8.4.5;Stochastic Gradient Descent;224
8.4.6;Regularization;225
8.4.7;Multiclass Logistic Regression;228
8.4.7.1;Load Data;228
8.4.7.2;Normalize Data;229
8.4.7.3;Split Data;229
8.4.7.4;Training Logistic Regression Model and Evaluating;229
8.4.7.5;Generalized Linear Models;231
8.5;Supervised Learning–Process Flow;233
8.5.1;Decision Trees;234
8.5.1.1;How the Tree Splits and Grows;236
8.5.1.2;Conditions for Stopping Partitioning;236
8.5.1.2.1;Key Parameters for Stopping Tree Growth;239
8.5.2;Support Vector Machine;240
8.5.2.1;Key Parameters;240
8.5.3;k-Nearest Neighbors;244
8.5.4;Time-Series Forecasting;247
8.5.4.1;Components of Time Series;247
8.5.4.2;Autoregressive Integrated Moving Average (ARIMA);248
8.5.4.2.1;Running ARIMA Model;248
8.5.4.2.2;Checking for Stationary;250
8.5.4.2.3;Autocorrelation Test;252
8.5.4.2.4;Build Model and Evaluate;253
8.5.4.2.5;Predicting Future Values;257
8.6;Unsupervised Learning Process Flow;258
8.6.1;Clustering;259
8.6.1.1;K-means;259
8.6.1.1.1;Limitations of K-means;260
8.6.1.1.2;Finding the Value of k;264
8.6.1.1.2.1;Elbow Method;264
8.6.1.1.2.2;Average Silhouette Method;266
8.6.1.2;Hierarchical Clustering;268
8.6.1.2.1;Key parameters;268
8.6.2;Principal Component Analysis (PCA);271
8.7;Summary;275
9;Chapter 4: Step 4: Model Diagnosis and Tuning;277
9.1;Optimal Probability Cutoff Point;278
9.1.1;Which Error Is Costly?;282
9.2;Rare Event or Imbalanced Dataset;282
9.2.1;Which Resampling Technique Is the Best?;286
9.3;Bias and Variance;288
9.3.1;Bias;288
9.3.2;Variance;288
9.4;K-Fold Cross Validation;290
9.4.1;Stratified K-fold Cross-Validation;291
9.5;Ensemble Methods;294
9.5.1;Bagging;295
9.5.2;Feature Importance;298
9.5.3;RandomForest;299
9.5.4;Extremely Randomized Trees (ExtraTree);299
9.5.5;How Does the Decision Boundary Look?;300
9.5.6;Bagging—Essential Tuning Parameters;303
9.6;Boosting;303
9.6.1;Example Illustration for AdaBoost;304
9.6.1.1;Boosting Iteration 1;305
9.6.1.2;Boosting Iteration 2;305
9.6.1.3;Boosting Iteration 3;306
9.6.1.4;Final Model;306
9.6.2;Gradient Boosting;309
9.6.3;Boosting—Essential Tuning Parameters;312
9.6.4;Xgboost (eXtreme Gradient Boosting);313
9.6.5;Ensemble Voting—Machine Learning’s Biggest Heroes United;318
9.6.5.1;Hard Voting vs. Soft Voting;321
9.6.6;Stacking;322
9.7;Hyperparameter Tuning;326
9.7.1;GridSearch;326
9.7.2;RandomSearch;328
9.7.3;Bayesian Optimization;330
9.7.4;Noise Reduction for Time-Series IoT Data;333
9.8;Summary;336
10;Chapter 5: Step 5: Text Mining and Recommender Systems;338
10.1;Text Mining Process Overview;339
10.2;Data Assemble (Text);340
10.2.1;Social Media;342
10.3;Data Preprocessing (Text);347
10.3.1;Convert to Lower Case and Tokenize;347
10.3.1.1;Sentence Tokenizing;347
10.3.1.2;Word Tokenizing;348
10.3.2;Removing Noise;349
10.3.3;Part of Speech (PoS) Tagging;351
10.3.4;Stemming;353
10.3.5;Lemmatization;355
10.3.6;N-grams;358
10.3.7;Bag of Words;360
10.3.8;Term Frequency-Inverse Document Frequency (TF-IDF);363
10.4;Data Exploration (Text);364
10.4.1;Frequency Chart;365
10.4.2;Word Cloud;366
10.4.3;Lexical Dispersion Plot;367
10.4.4;Cooccurrence Matrix;368
10.5;Model Building;369
10.5.1;Text Similarity;370
10.5.2;Text Clustering;372
10.5.2.1;Latent Semantic Analysis (LSA);373
10.5.3;Topic Modeling;377
10.5.3.1;Latent Dirichlet Allocation;377
10.5.3.2;Nonnegative Matrix Factorization;379
10.5.4;Text Classification;380
10.5.5;Sentiment Analysis;382
10.5.6;Deep Natural Language Processing (DNLP);384
10.6;Word2Vec;386
10.7;Recommender Systems;388
10.7.1;Content-Based Filtering;389
10.7.2;Collaborative Filtering (CF);390
10.8;Summary;394
11;Chapter 6: Step 6: Deep and Reinforcement Learning;395
11.1;Artificial Neural Network (ANN);397
11.1.1;What Goes On Behind, When Computers Look at an Image?;398
11.1.1.1;Why Not a simple Classification Model for Images?;399
11.1.2;Perceptron—Single Artificial Neuron;399
11.1.3;Multilayer Perceptrons (Feedforward Neural Network);402
11.1.3.1;Load MNIST Data;404
11.1.3.2;Key Parameters for Scikit-learn MLP;405
11.1.4;Restricted Boltzman Machines (RBMs);408
11.1.5;MLP Using Keras;414
11.1.6;Autoencoders;419
11.1.6.1;Dimension Reduction Using an Autoencoder;420
11.1.6.2;Denoise Image Using an Autoencoder;425
11.1.7;Convolutional Neural Network (CNN);426
11.1.8;CNN on MNIST Dataset;435
11.1.8.1;Visualization of Layers;438
11.1.9;Recurrent Neural Network (RNN);440
11.1.9.1;Long Short Term Memory (LSTM);441
11.1.10;Transfer Learning;445
11.2;Reinforcement Learning;450
11.3;Summary;454
12;Chapter 7: Conclusion;455
12.1;Tips;457
12.1.1;Start with Questions/Hypothesis, Then Move to Data!;457
12.1.2;Don’t Reinvent the Wheel from Scratch;458
12.1.3;Start with Simple Models;459
12.1.4;Focus on Feature Engineering;459
12.1.5;Beware of Common ML Imposters;460
12.2;Happy Machine Learning;460
13;Index;461



Ihre Fragen, Wünsche oder Anmerkungen
Vorname*
Nachname*
Ihre E-Mail-Adresse*
Kundennr.
Ihre Nachricht*
Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.
Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.