Given the abundance of rigorous treatments of LDP and COM referenced above, I'll focus instead on the "intuitive flavor" of the two approaches. First, the similarities. Both LDP and COM allow one to prove Laws of Large Numbers (LLN). For our handwavy purposes, a LLN is any claim asserting the convergence (weak, almost sure, in probability) of a sequence of random variables to their common expected value.
Quoting Dembo and Zeitouni (p. 4):
Letting be the empirical mean of independent, one-dimensional standard Gaussian random variables, Dembo and Zeitouni give a simple calculation (1.1.3) showing thatThe large deviation principle [...] characterizes the limiting
behavior, as , of a family of probability measures
on , in terms of a rate function.
which they summarize as
Note that this result, on its own, does not tell you how large of a sample you need (i.e., how big to make ) to ensure, with probabilityThe "typical value" of is [...] of the order ,
but with small probability (of the order of ),
takes relatively large values.
that .
To get such a finite sample guarantee (as opposed to an asymptotic limit), we need a COM result. As a slight technicality, let us truncate those Gaussians to the interval and renormalize (anyone who suspects foul play may be referred to Samuel Kutin's extension of McDiarmid's inequality). Then a simple application of Hoeffding's bound gives
Note that the asymptotics match those of the LDP, but we can guarantee that for any
with probability at least , the empirical mean will not deviate from the expected value by more than . Note, however, that we do not have a rate of convergence (in probability) of a random variable to its mean; all we have is an upper bound.
I hope these two examples illustrate the different flavor of the two apporaches. COM tends to give sharper upper bounds on deviation probabilities. These have the advantage of being true for all (excepting at most finitely many) values of , and the disadvantage of providing next to no information on the lower bounds. LDP gives the asymptotic rate of convergence (both upper and lower bounds), but does not (at least without some additional work) yield finite-sample bounds.
It is little wonder, then, that COM has found many more applications in computer science (e.g., randomized algorithms) and statistical machine learning, than LDP. The user wants to know how long to run your algorithm to achieve a specified precision at a given confidence level; order-of-magnitude information for "typical" values is usually insufficient.
Having given an informal "definition" of LDP, let me close with an equally informal discussion of COM, this time, quoting myself:
I realize this isn't terribly informative (or at all original), but the linked paper does attempt a crude survey (with an emphasis of passing from indepenent sequences to those with weak dependence). A word of caution for those attempting to visualize "measure concentration": imagining a narrowly peaked, one-dimensional distribution is tempting but wrong. This is a genuinely high-dimensional phenomenon. The only "visual" intuition I can offer is that the (Lesbegue) volume of the -dimensional Euclidean unit ball goes to zero with , while the volume of the cube (in which the ball is tightly inscribed) blows up exponentially. This means that in high dimensions, most a set's content is concentrated near its boundary -- something that's certainly not true in low dimensions. Anyone who's intrigued by this (that's everyone, right?) should read Milman's "Surprising geometric phenomena in high dimensional convexity theory".A common way of summarizing the phenomenon is to say that in a high-dimensional space, almost all of the probability is concentrated around any set whose measure is at least . Another way is to say that any "sufficiently continuous" function is tightly concentrated about its mean.
2 comments:
By an eerie coincidence, I stumbled upon
http://yaroslavvb.blogspot.com/2006/05/curse-of-dimensionality-and-intuition.html
shortly after making that post. Indeed, when high dimensionality refers to the ambient space of the data, it is a curse for curve-fitting (regression). In my post, "dimensionality" refers to sample size, and is not a curse but a boon.
My browser (Chrome) does not show your math formulas. Do I need to install a plug-in or something?
Post a Comment