i'm glad to see conformal prediction getting a bit more exposure. the idea is quite interesting and reasonable (in terms of the assumptions you make), and in principle the conformal transductive approach can be used to turn any standard supervised learning algorithm into one that produces some kind of conformal confidence intervals. but, as Scott writes
> Practically speaking, this kind of transductive prediction is computationally prohibitive
I tried to use this some years ago - with the simpler ridge regression conformal approach described in the book [1] - when fitting empirical models to experimental data (low number of samples, very high cost of obtaining more samples, high dimensional space) where it seemed desirable to produce some kind of reasonable estimate of the uncertainty of the model fit without making a bunch of assumptions about the underlying relationship.
> There are a number of ad hoc ways of generating confidence intervals using resampling methods and generating a distribution of predictions.
In practice I ended up doing something ad-hoc -- just boostrap sample and fit a bunch of decision trees, then back some kind of crude confidence interval out of the distribution of resulting predictions. I think I ended up preferring this over over a conformal regularised linear model approach because the trees seemed to be a better able to model the actual relationship than whatever simple family of linear model we were using (probably just degree 2 polynomials in the raw input values, there wasn't really enough data for the number of dimensions to support doing much else).
I've never read up on the non-transductive approaches to conformal prediction, so it'll be interesting to read up on some of the references from this post.
[1] Algorithmic learning in a random world - Vovk, Gammerman, Shafer http://www.alrw.net/
> Practically speaking, this kind of transductive prediction is computationally prohibitive
I tried to use this some years ago - with the simpler ridge regression conformal approach described in the book [1] - when fitting empirical models to experimental data (low number of samples, very high cost of obtaining more samples, high dimensional space) where it seemed desirable to produce some kind of reasonable estimate of the uncertainty of the model fit without making a bunch of assumptions about the underlying relationship.
> There are a number of ad hoc ways of generating confidence intervals using resampling methods and generating a distribution of predictions.
In practice I ended up doing something ad-hoc -- just boostrap sample and fit a bunch of decision trees, then back some kind of crude confidence interval out of the distribution of resulting predictions. I think I ended up preferring this over over a conformal regularised linear model approach because the trees seemed to be a better able to model the actual relationship than whatever simple family of linear model we were using (probably just degree 2 polynomials in the raw input values, there wasn't really enough data for the number of dimensions to support doing much else).
I've never read up on the non-transductive approaches to conformal prediction, so it'll be interesting to read up on some of the references from this post.
[1] Algorithmic learning in a random world - Vovk, Gammerman, Shafer http://www.alrw.net/