Given the abundance of rigorous treatments of LDP and COM referenced above, I'll focus instead on the "intuitive flavor" of the two approaches. First, the similarities. Both LDP and COM allow one to prove Laws of Large Numbers (LLN). For our handwavy purposes, a LLN is any claim asserting the convergence (weak, almost sure, in probability) of a sequence of random variables to their common expected value.
Quoting Dembo and Zeitouni (p. 4):
LettingThe large deviation principle [...] characterizes the limiting
behavior, as, of a family of probability measures
on
, in terms of a rate function.


which they summarize as
Note that this result, on its own, does not tell you how large of a sample you need (i.e., how big to makeThe "typical value" of
is [...] of the order
,
but with small probability (of the order of),
takes relatively large values.



To get such a finite sample guarantee (as opposed to an asymptotic limit), we need a COM result. As a slight technicality, let us truncate those Gaussians to the interval

Note that the asymptotics match those of the LDP, but we can guarantee that for any
with probability at least


I hope these two examples illustrate the different flavor of the two apporaches. COM tends to give sharper upper bounds on deviation probabilities. These have the advantage of being true for all (excepting at most finitely many) values of

It is little wonder, then, that COM has found many more applications in computer science (e.g., randomized algorithms) and statistical machine learning, than LDP. The user wants to know how long to run your algorithm to achieve a specified precision at a given confidence level; order-of-magnitude information for "typical" values is usually insufficient.
Having given an informal "definition" of LDP, let me close with an equally informal discussion of COM, this time, quoting myself:
I realize this isn't terribly informative (or at all original), but the linked paper does attempt a crude survey (with an emphasis of passing from indepenent sequences to those with weak dependence). A word of caution for those attempting to visualize "measure concentration": imagining a narrowly peaked, one-dimensional distribution is tempting but wrong. This is a genuinely high-dimensional phenomenon. The only "visual" intuition I can offer is that the (Lesbegue) volume of theA common way of summarizing the phenomenon is to say that in a high-dimensional space, almost all of the probability is concentrated around any set whose measure is at least
. Another way is to say that any "sufficiently continuous" function is tightly concentrated about its mean.



2 comments:
By an eerie coincidence, I stumbled upon
http://yaroslavvb.blogspot.com/2006/05/curse-of-dimensionality-and-intuition.html
shortly after making that post. Indeed, when high dimensionality refers to the ambient space of the data, it is a curse for curve-fitting (regression). In my post, "dimensionality" refers to sample size, and is not a curse but a boon.
My browser (Chrome) does not show your math formulas. Do I need to install a plug-in or something?
Post a Comment