Conformal Prediction for Regression

We fix a probability space $(\Omega,\mathcal{F},\mathbb{P})$. We denote by $2^{\mathbf{X}}$ the set of subsets of the set $\mathbf{X}$.

Definition 1.1: Let $\mathbf{X}, \mathbf{Y}$ be metric spaces and let $\mathbf{Z} = \mathbf{X}\times\mathbf{Y}$ be the corresponding product space. Let $N\in \mathbb{N}$ and $\sigma = { z_{1},\ldots,z_{N}} \in 2^{\mathbf{Z}}$ an associated sample. For $\alpha\in (0,1)$ the conformal predictor of coverage $\alpha$ is a measurable function of the form:

\[\begin{aligned} \Gamma^{\alpha} \colon 2^{\mathbf{Z}}\times \mathbf{X} & \to 2^{\mathbf{Y}}\\ (\sigma, x) & \mapsto \Gamma_{x}^{\alpha}(\sigma), \end{aligned}\]

such that for a new point $z_{N+1} = (x_{N+1}, y_{N+1})\in \mathbf{Z}$ we have that:

\[\mathbb{P}\left(y_{N+1} \in \Gamma_{x_{N+1}}^{\alpha}(\sigma)\right) \geq 1 - \alpha.\]

Roughly speaking, a conformal predictor is giving “bounds” on the prediction of an estimator built from the data $\sigma$ if the true value would $y_{N+1}$ would have been known. With probability at least $1-\alpha$ the true value of $y_{N+1}$ will be in the inferred set $\Gamma_{x}^{\alpha}(\sigma)$. In order to build these sets, we assume that we have learned a regression function $R_{\sigma}\colon \mathbf{X}\to \mathbf{Y}$ from the data $\sigma$. In truth, the sets $\sigma$ are more exactly bags whose exact definition is given in Vovk’s book but we will not further complexify. We choose to view datasets as mere finite (or even discrete) subsets of $\mathbf{Z}$ (thus elements in $2^{\mathbf{Z}})$.
A non-conformity score is any measurable function of the form :

\[\begin{aligned} A \colon 2^{\mathbf{Z}}\times \mathbf{Z} & \to \overline{\mathbb{R}}\\ (\sigma, z) & \mapsto A(\sigma,z). \end{aligned}\]

Example 1.1: Assume $\mathbf{X} = \mathbf{Y} = \mathbb{R}$ and let $\sigma = {z_{1},\ldots,z_{N}} = { (x_{1},y_{1}),\ldots,(x_{N},y_{N})}$. A straightforward non-conformity score is given by the absolute value of the difference between prediction and actual value, for all $i=1,\ldots,N$:

\[A(\sigma,z_{i}) = |y_{i} - R_{\sigma}(x_{i})| = |y_{i} - \hat{y}_{i}|\]

Assume now that we dispose of all ingredients, namely a non-conformity score $A$, a dataset $\sigma$ of size $N$ and a learned regression function $R_{\sigma}$. For a new input $x_{N+1}$ and output $y$, we denote by $\widetilde{\sigma} := \sigma \cup {(x_{N+1}, y)}$, we define the conformal predictor set as :

\[\Gamma_{x_{N+1}}^{\alpha}(\sigma) = \left\{y\in \mathbf{Y}: \frac{\# \{i=1,\ldots,N:\;A(\widetilde{\sigma},(x_{i},y_{i})) \geq A(\widetilde{\sigma},(x_{N+1},y))\}}{N} > \alpha \right\}.\]

It can be shown that this object has the coverage property.