E-Book, Englisch, 334 Seiten
Balasubramanian / Ho / Vovk Conformal Prediction for Reliable Machine Learning
1. Auflage 2014
ISBN: 978-0-12-401715-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
Theory, Adaptations and Applications
E-Book, Englisch, 334 Seiten
ISBN: 978-0-12-401715-3
Verlag: Elsevier Science & Techn.
Format: EPUB
Kopierschutz: 6 - ePub Watermark
The conformal predictions framework is a recent development in machine learning that can associate a reliable measure of confidence with a prediction in any real-world pattern recognition application, including risk-sensitive applications such as medical diagnosis, face recognition, and financial risk prediction. Conformal Predictions for Reliable Machine Learning: Theory, Adaptations and Applications captures the basic theory of the framework, demonstrates how to apply it to real-world problems, and presents several adaptations, including active learning, change detection, and anomaly detection. As practitioners and researchers around the world apply and adapt the framework, this edited volume brings together these bodies of work, providing a springboard for further research as well as a handbook for application in real-world problems. - Understand the theoretical foundations of this important framework that can provide a reliable measure of confidence with predictions in machine learning - Be able to apply this framework to real-world problems in different machine learning settings, including classification, regression, and clustering - Learn effective ways of adapting the framework to newer problem settings, such as active learning, model selection, or change detection
Autoren/Hrsg.
Weitere Infos & Material
The Basic Conformal Prediction Framework
Vladimir Vovk, Computer Learning Research Centre, Department of Computer Science, Royal Holloway, University of London, United Kingdom
Abstract
The aim of this chapter is to give a gentle introduction to the method of conformal prediction. It defines conformal predictors and discusses their properties, leaving various extensions of conformal predictors for Chapter 2.
Keywords
Conformal Prediction; Validity; Efficiency; Classification; Regression; Exchangeability
Acknowledgments
I am grateful to Sasha Tsybakov for his advice. The empirical studies described in this chapter used the R language and the R package PredictiveRegression; I am grateful to Ilia Nouretdinov for cowriting (and writing the first version of) the package. This work was supported in part by the Cyprus Research Promotion Foundation (TPE/ORIZO/0609(BIE)/24) and EPSRC (EP/K033344/1).
The aim of this chapter is to give a gentle introduction to the method of conformal prediction. It will define conformal predictors and discuss their properties, leaving various extensions of conformal prediction for Chapter 2.
1.1 The Basic Setting and Assumptions
In the bulk of this chapter we consider the basic setting where we are given a training set of examples, and our goal is to predict a new example. We will assume that the examples are elements of an (formally, this is assumed to be a measurable space, i.e., a set equipped with a -algebra). We always assume that contains more than one element, Z|>1, and that each singleton is measurable. The examples in the training set will usually be denoted 1,…,zl and the example to be predicted, () l+1. Mathematically the training set is a sequence, z1,…,zl), not a set.
The basic setting might look restrictive, but later in this chapter we will see that it covers the standard problems of classification (Section 1.6) and regression (Section 1.7); we will also see that the algorithms developed for our basic setting can be applied in the online (Section 1.8) and batch (Section 2.4) modes of prediction.
We will make two main kinds of assumptions about the way the examples i,i=1,…,l+1, are generated. Let us fix the size =1 of the training set for now. Under the , the +1 examples are generated independently from the same unknown probability distribution on . Under the , the sequence z1,…,zl+1) is generated from a probability distribution on l+1 that is : for any permutation of the set 1,…,l+1}, the distribution of the permuted sequence zp(1),…,zp(l+1)) is the same as the distribution of the original sequence z1,…,zl+1). It is clear that the randomness assumption implies the exchangeability assumption, and in Section 1.5 we will see that the exchangeability assumption is much weaker. (On the other hand, in the online mode of prediction the difference between the two assumptions almost disappears, as we will see in Section 1.8.)
The randomness assumption is a standard assumption in machine learning. Methods of conformal prediction, however, usually work for the weaker exchangeability assumption. In some important cases even the exchangeability assumption can be weakened; see, for example, Chapters 8 and 9 of [365] dealing with online compression modeling.
1.2 Set and Confidence Predictors
In this book we are concerned with reliable machine learning, and so consider prediction algorithms that output a set of elements of as their prediction; such a set is called a (or a ). The statement implicit in a prediction set is that it contains the test example l+1, and the prediction set is regarded as erroneous if and only if it fails to contain l+1. We will be looking for a compromise between reliability and informativeness of the prediction sets output by our algorithms; an example of prediction sets we try to avoid is the whole of ; it is absolutely reliable but not informative.
A is a function that maps any sequence z1,…,zl)?Zl to a set (z1,…,zl)?Z and satisfies the following measurability condition: the set
z1,…,zl+1)|zl+1?G(z1,…,zl) (1.1)
is measurable in l+1.
We will often consider nested families of set predictors depending on a parameter ?[0,1], which we call the , reflecting the required reliability of prediction. Our parameterization of reliability will be such that smaller values of correspond to greater reliability. (This is just a convention: e.g., if we used the -? as the parameter, larger values of the parameter would correspond to greater reliability.)
Formally, a is a family G?|??[0,1]) of set predictors that is nested in the following sense: whenever =?1=?2=1,
?1(z1,…,zl)?G?2(z1,…,zl). (1.2)
1.2.1 Validity and Efficiency of Set and Confidence Predictors
The two main indicators of the quality of set and confidence predictors are what we call their validity (how reliable they are) and efficiency (how informative they are).1 We say that a set predictor is ?[0,1] if, under any power probability distribution =Ql+1 on l+1, the probability of the event l+1?G(z1,…,zl) that makes an error is . However, it is obvious that the property of exact validity is impossible to achieve unless is either 0 or 1:
Proposition 1.1
?(0,1)
Proof
Let be a probability distribution on that is concentrated at one point. Then any set predictor makes a mistake with probability either 0 or 1.
In Section 1.8 we will see that exact validity can be achieved using randomization.
The requirement that can be achieved (even trivially) is that of “conservative validity.” A set predictor is said to be (or simply ) ?[0,1] if, under any power probability distribution =Ql+1 on l+1, the probability of l+1?G?(z1,…,zl) does not exceed . The trivial way to achieve this, for any ?[0,1], is to set (z1,…,zl)?Z for all 1,…,zl. A confidence predictor G?|??[0,1]) is () if each of its constituent set predictors ? is valid at the significance level . Conformal predictors will provide nontrivial conservatively, and in some sense almost exactly, valid confidence predictors. In the following chapter we will discuss other notions of validity.
By the efficiency of set and confidence predictors we mean the smallness of the prediction sets they output. This is a vague notion, but in any case it can be meaningful only if we impose some restrictions on the predictors that we consider. Without restrictions, the trivial set predictor (z1,…,zl)?Ø,?z1,…,zl, and the trivial confidence predictor ?(z1,…,zl)?Ø,?z1,…,zl,?, are the most efficient ones. We will be looking for the most efficient confidence predictors in the class of valid confidence predictors; different notions of validity (including “conditional validity” considered in the next chapter) and different formalizations of the notion of efficiency will lead to different solutions to this problem.
1.3 Conformal Prediction
Let ?N, where ?{1,2,…} is the set of natural numbers. A...