E-Book, Englisch, 1062 Seiten
Theodoridis Machine Learning
1. Auflage 2015
ISBN: 978-0-12-801722-7
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
A Bayesian and Optimization Perspective
E-Book, Englisch, 1062 Seiten
ISBN: 978-0-12-801722-7
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Sergios Theodoridis is professor emeritus of machine learning and data processing with the National and Kapodistrian University of Athens, Greece. He is a Fellow of EURASIP and a Life Fellow of IEEE. He is the coauthor of the best-selling book Pattern Recognition, 4th edition, Academic Press, 2009, and of the book Introduction to Pattern Recognition: A MATLAB Approach, Academic Press, 2010.
Autoren/Hrsg.
Weitere Infos & Material
Introduction
Abstract
This chapter serves as an introduction to the text and an overview of machine learning. It deals with two problems at the heart of machine learning and of the book—classification and regression tasks. The chapter also outlines the structure of the book and provides a road map for students and instructors. A summary of each chapter is provided. The first six chapters of the book deal with classical topics, while the remaining twelve cover more advanced techniques. Finally, the author offers suggestions on which chapters to cover based on the focus of the particular course.
Keywords
Machine learning
Statistical signal processing
Adaptive signal processing
Bayesian learning
Classification
Regression
Chapter Outline
1.1 What Machine Learning is About 1
1.1.1 Classification 2
1.1.2 Regression 3
1.1 What Machine Learning is About
Learning through personal experience and knowledge, which propagates from generation to generation, is at the heart of human intelligence. Also, at the heart of any scientific field lies the development of models (often, they are called theories) in order to explain the available experimental evidence at each time period. In other words, we always learn from data. Different data and different focuses on the data give rise to different scientific disciplines.
This book is about learning from data; in particular, our intent is to detect and unveil a possible hidden structure and regularity patterns associated with their generation mechanism. This information in turn helps our analysis and understanding of the nature of the data, which can be used to make predictions for the future. Besides modeling the underlying structure, a major direction of significant interest in Machine Learning is to develop efficient algorithms for designing the models and also for analysis and prediction. The latter part is gaining importance in the dawn of what we call the big data era, when one has to deal with massive amounts of data, which may be represented in spaces of very large dimensionality. Analyzing data for such applications sets demands on algorithms to be computationally efficient and at the same time robust in their performance, because some of these data are contaminated with large noise and also, in some cases, the data may have missing values.
Such methods and techniques have been at the center of scientific research for a number of decades in various disciplines, such as Statistics and Statistical Learning, Pattern Recognition, Signal and Image Processing and Analysis, Computer Science, Data Mining, Machine Vision, Bioinformatics, Industrial Automation, and Computer-Aided Medical Diagnosis, to name a few. In spite of the different names, there is a common corpus of techniques that are used in all of them, and we will refer to such methods as Machine Learning. This name has gained popularity over the last decade or so. The name suggests the use of a machine/computer to learn in analogy to how the brain learns and predicts. In some cases, the methods are directly inspired by the way the brain works, as is the case with neural networks, covered in Chapter 18.
Two problems at the heart of machine learning, which also comprise the backbone of this book, are the classification and the regression tasks.
1.1.1 Classification
The goal in classification is to assign an unknown pattern to one out of a number of classes that are considered to be known. For example, in X-ray mammography, we are given an image where a region indicates the existence of a tumor. The goal of a computer-aided diagnosis system is to predict whether this tumor corresponds to the benign or the malignant class. Optical character recognition (OCR) systems are also built around a classification system, in which the image corresponding to each letter of the alphabet has to be recognized and assigned to one of the twenty-four (for the Latin alphabet) classes; see Section 18.11, for a related case study. Another example is the prediction of the authorship of a given text. Given a text written by an unknown author, the goal of a classification system is to predict the author among a number of authors (classes); this application is treated in Section 11.15.
The first step in designing any machine learning task is to decide how to represent each pattern in the computer. This is achieved during the preprocessing stage; one has to “encode” related information that resides in the raw data (image pixels or strings of letters in the previous examples) in an efficient and information-rich way. This is usually done by transforming the raw data in a new space with each pattern represented by a vector, x ? l. This is known as the feature vector, and its l elements are known as the features. In this way, each pattern becomes a single point in an l-dimensional space, known as the feature space or the input space. We refer to this as the feature generation stage. Usually, one starts with some large value K of features and eventually selects the l most informative ones via an optimizing procedure known as the feature selection stage.
Having decided upon the input space, in which the data are represented, one has to train a classifier. This is achieved by first selecting a set of data whose class is known, which comprises the training set. This is a set of pairs, (yn, xn), n = 1,2,…,N, where yn is the (output) variable denoting the class in which xn belongs, and it is known as the corresponding class label; the class labels, y, take values over a discrete set, {1,2,…,M}, for an M-class classification task. For example, for a two-class classification task, yn ?{-1,+1}. To keep our discussion simple, let us focus on the two-class case. Based on the training data, one then designs a function, f, which predicts the output label given the input; that is, given the measured values of the features. This function is known as the classifier. In general, we need to design a set of such functions.
Once the classifier has been designed, the system is ready for predictions. Given an unknown pattern, we form the corresponding feature vector, x, from the raw data, and we plug this value into the classifier; depending on the value of f(x) (usually on the respective sign, =sgnf(x)) the pattern is classified in one of the two classes. Figure 1.1 illustrates the classification task. Initially, we are given the set of points, each representing a pattern in the two-dimensional space (two features used, x1,x2). Stars belong to one class, say ?1 and the crosses to the other, ?2, in a two-class classification task. These are the training points. Based on these points, a classifier was learned; for our very simple case, this turned out to be a linear function,
(x)=?1x1+?2x2+?0,
(1.1)
whose graph for all the points such as: f(x) = 0, is the straight line shown in the figure. Then, we are given the point denoted by the red circle; this corresponds to the measured values from a pattern whose class is unknown to us. According to the classification system, which we have designed, this belongs to the same class as the points denoted by stars. Indeed, every point on one side of the straight line will give a positive value, f(x) > 0, and all the points on its other side will give a negative value, f(x) < 0. The point denoted with the red circle will then result in f(x) > 0, as all the star points, and it is classified in the same class, ?1.
This type of learning is known as supervised learning, since a set of training data with known labels is available. Note that the training data can be seen as the available previous experience, and based on this, one builds a model to make predictions for the future....




