Absolutely Regular: July 2008

Wednesday, July 9, 2008

Learning paradox

Consider the standard PAC learning model. We have a concept class C over an instance space X, as well as a probability distrubution P on X. After observing a labeled sample of size m, and finding an f in C consistent with this sample, we will know that with high probability the generalization error of f is bounded by something like

(*) sqrt(d / m) + confidence term,

where d is the VC-dimension of the concept class C [the confidence term isn't the interesting part here -- it appears in all generalization bounds of this type, and goes to zero as 1/sqrt(m)]. Intuitively, this bound says that if we managed to explain the observations by picking a model from a "simple" class, we can be confident that our predictions will be good. Explanations by more complex models are less informative. In the trivial case, C= 2^X and VC-dim(C)=|X|
-- so achieving zero sample error tells us nothing about generalization error.

However, one might reason as follows. Suppose I receive a labeled sample and find a classifier f in C that achieves zero sample error. Now what's stopping me from re-defining my concept class? I'll define a new concept class C' = {f} that contains a single concept -- f! Its VC-dimension is trivially zero. As far as my prediction is concerned, it doesn't matter whether I learned using C or C' -- in either case, I'll pick the classifier f. But the VC-dimension of C might be large, giving me a poor generalization guarantee (*), while VC-dim(C') = 0, which is as good as things can be!

What's wrong with this reasoning?

Absolutely Regular

Wednesday, July 9, 2008

Learning paradox

Who are these people?

Absolutely Regular

Blog Archive

Courses and Colloquia

Resources

Fellow mathematicians

Friends

News & Commentary

Current Events & Articles

Humor

Other Interesting Stuff