Fundamentals of Robust Machine Learning

Handling Outliers and Anomalies in Data Science
1. Auflage 2025
ISBN: 978-1-394-29437-4
Verlag: Wiley

Buch, Englisch, 416 Seiten, Format (B × H): 188 mm x 234 mm, Gewicht: 839 g

Fundamentals of Robust Machine Learning
1. Auflage 2025, 978-1-394-29438-1, eBook, EPUB, 2 - DRM Adobe

Handling Outliers and Anomalies in Data Science

Buch, Englisch, 416 Seiten, Format (B × H): 188 mm x 234 mm, Gewicht: 839 g

ISBN: 978-1-394-29437-4
Verlag: Wiley

104,50 €

(inkl. MwSt.)

versandkostenfreie Lieferung
Lieferfrist: bis zu 10 Werktage

Bücher versandkostenfrei

kostenlose Rücksendung

An essential guide for tackling outliers and anomalies in machine learning and data science.

In recent years, machine learning (ML) has transformed virtually every area of research and technology, becoming one of the key tools for data scientists. Robust machine learning is a new approach to handling outliers in datasets, which is an often-overlooked aspect of data science. Ignoring outliers can lead to bad business decisions, wrong medical diagnoses, reaching the wrong conclusions or incorrectly assessing feature importance, just to name a few.

Fundamentals of Robust Machine Learning offers a thorough but accessible overview of this subject by focusing on how to properly handle outliers and anomalies in datasets. There are two main approaches described in the book: using outlier-tolerant ML tools, or removing outliers before using conventional tools. Balancing theoretical foundations with practical Python code, it provides all the necessary skills to enhance the accuracy, stability and reliability of ML models.

Fundamentals of Robust Machine Learning readers will also find: - A blend of robust statistics and machine learning principles
- Detailed discussion of a wide range of robust machine learning methodologies, from robust clustering, regression and classification, to neural networks and anomaly detection
- Python code with immediate application to data science problems

Fundamentals of Robust Machine Learning is ideal for undergraduate or graduate students in data science, machine learning, and related fields, as well as for professionals in the field looking to enhance their understanding of building models in the presence of outliers.

Saleh / Majzoub Fundamentals of Robust Machine Learning jetzt bestellen!

Autoren/Hrsg.

Saleh, Resve A

Majzoub, Sohaib

Saleh, A. K. Md. Ehsanes

Fachgebiete

Weitere Infos & Material

Inhaltsverzeichnis

Preface xv

About the Companion Website xix

1 Introduction 1

1.1 Defining Outliers 2

1.2 Overview of the Book 3

1.3 What Is Robust Machine Learning? 3

1.3.1 Machine Learning Basics 4

1.3.2 Effect of Outliers 6

1.3.3 What Is Robust Data Science? 7

1.3.4 Noise in Datasets 7

1.3.5 Training and Testing Flows 8

1.4 Robustness of the Median 9

1.4.1 Mean vs. Median 9

1.4.2 Effect on Standard Deviation 10

1.5 l 1 and l 2 Norms 11

1.6 Review of Gaussian Distribution 12

1.7 Unsupervised Learning Case Study 13

1.7.1 Clustering Example 14

1.7.2 Clustering Problem Specification 14

1.8 Creating Synthetic Data for Clustering 16

1.8.1 One-Dimensional Datasets 16

1.8.2 Multidimensional Datasets 17

1.9 Clustering Algorithms 19

1.9.1 k-Means Clustering 19

1.9.2 k-Medians Clustering 21

1.10 Importance of Robust Clustering 22

1.10.1 Clustering with No Outliers 22

1.10.2 Clustering with Outliers 23

1.10.3 Detection and Removal of Outliers 25

1.11 Summary 27

Problems 28

References 34

2 Robust Linear Regression 35

2.1 Introduction 35

2.2 Supervised Learning 35

2.3 Linear Regression 36

2.4 Importance of Residuals 38

2.4.1 Defining Errors and Residuals 38

2.4.2 Residuals in Loss Functions 39

2.4.3 Distribution of Residuals 40

2.5 Estimation Background 42

2.5.1 Linear Models 42

2.5.2 Desirable Properties of Estimators 43

2.5.3 Maximum-Likelihood Estimation 44

2.5.4 Gradient Descent 47

2.6 M-Estimation 49

2.7 Least Squares Estimation (LSE) 52

2.8 Least Absolute Deviation (LAD) 54

2.9 Comparison of LSE and LAD 55

2.9.1 Simple Linear Model 55

2.9.2 Location Problem 56

2.10 Huber’s Method 58

2.10.1 Huber Loss Function 58

2.10.2 Comparison with LSE and LAD 63

2.11 Summary 64

Problems 64

References 67

3 The Log-Cosh Loss Function 69

3.1 Introduction 69

3.2 An Intuitive View of Log-Cosh 69

3.3 Hyperbolic Functions 71

3.4 M-Estimation 71

3.4.1 Asymptotic Behavior 72

3.4.2 Linear Regression Using Log-Cosh 74

3.5 Deriving the Distribution for Log-Cosh 75

3.6 Standard Errors for Robust Estimators 79

3.6.1 Example: Swiss Fertility Dataset 81

3.6.2 Example: Boston Housing Dataset 82

3.7 Statistical Properties of Log-Cosh Loss 83

3.7.1 Maximum-Likelihood Estimation 83

3.8 A General Log-Cosh Loss Function 84

3.9 Summary 88

Problems 88

References 93

4 Outlier Detection, Metrics, and Standardization 95

4.1 Introduction 95

4.2 Effect of Outliers 95

4.3 Outlier Diagnosis 97

4.3.1 Boxplots 98

4.3.2 Histogram Plots 100

4.3.3 Exploratory Data Analysis 101

4.4 Outlier Detection 102

4.4.1 3-Sigma Edit Rule 102

4.4.2 4.5-MAD Edit Rule 104

4.4.3 1.5-IQR Edit Rule 105

4.5 Outlier Removal 105

4.5.1 Trimming Methods 105

4.5.2 Winsorization 105

4.5.3 Anomaly Detection Method 106

4.6 Regression-Based Outlier Detection 107

4.6.1 LS vs. LC Residuals 108

4.6.2 Comparison of Detection Methods 109

4.6.3 Ordered Absolute Residuals (OARs) 110

4.6.4 Quantile–Quantile Plot 111

4.6.5 Quad-Plots for Outlier Diagnosis 113

4.7 Regression-Based Outlier Removal 114

4.7.1 Iterative Boxplot Method 114

4.8 Regression Metrics with Outliers 116

4.8.1 Mean Square Error (MSE) 117

4.8.2 Median Absolute Error (MAE) 118

4.8.3 MSE vs. MAE on Realistic Data 119

4.8.4 Selecting Hyperparameters for Robust Regression 120

4.9 Dataset Standardization 121

4.9.1 Robust Standardization 122

4.10 Summary 126

Problems 126

References 131

5 Robustne

Über Autor(innen)

Resve Saleh, (PhD, UC Berkeley) is a Professor Emeritus at the University of British Columbia. He worked for a decade as a professor at the University of Illinois and as a visiting professor at Stanford University. He was Founder and Chairman of Simplex Solutions, Inc., which went public in 2001. He is an IEEE Fellow and Fellow of the Canadian Academy of Engineering.

Sohaib Majzoub, (PhD, University of British Columbia) is an Associate Professor at the University of Sharjah, UAE. He also taught at the American University in Dubai, UAE and at King Saud University, KSA, and a visiting professor at Delft Technical University in The Netherlands. He is a Senior Member of the IEEE.

A. K. MD. Ehsanes Saleh, (PhD, University of Western Ontario) is a Professor Emeritus and Distinguished Professor in the School of Mathematics and Statistics, Carleton University, Ottawa, Canada. He also taught as Simon Fraser University, the University of Toronto, and Stanford University. He is a Fellow of IMS, ASA and an Honorary Member of SSC, Canada.

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

104,50 € (inkl. MwSt.)

Lieferfrist: bis zu 10 Werktage

Bücher versandkostenfrei

kostenlose Rücksendung

Webcode: sack.de/