Check it out.
UPDATE:
Here's another one.
I suppose I should say a word or two to get you to click. Both involve automata and probability, but in rather different ways. I know the suspense must be killing you by now -- all the answers are only a click away!..
UPDATE II: That first paper has a mistake, discovered by Dana Angluin and Lev Reyzin. It has been pulled from submission. The mistake seems fixable and possibly already fixed. Developing...
Wednesday, July 8, 2009
Monday, June 1, 2009
A reverse Jensen inequality for exponentials
Let x_1, ... x_n be real numbers and t_1, ... t_n be nonnegative numbers summing to 1.
A trivial consequence of Jensen's inequality is that
exp(t_1 x_1 + ... + t_n x_n) <= t_1 exp(x_1) + ... + t_n exp(x_n).
I claim that in the other direction, we have the following:
t_1 exp(x_1) + ... + t_n exp(x_n) <= exp( diam(x) + t_1 x_1 + ... + t_n x_n)
where diam(x) = max_i x_i - min_i x_i is the diameter of {x_1, ..., x_n}.
Any proof ideas? (Unlike my previous goof-ups, I can actually prove this one.)
A trivial consequence of Jensen's inequality is that
exp(t_1 x_1 + ... + t_n x_n) <= t_1 exp(x_1) + ... + t_n exp(x_n).
I claim that in the other direction, we have the following:
t_1 exp(x_1) + ... + t_n exp(x_n) <= exp( diam(x) + t_1 x_1 + ... + t_n x_n)
where diam(x) = max_i x_i - min_i x_i is the diameter of {x_1, ..., x_n}.
Any proof ideas? (Unlike my previous goof-ups, I can actually prove this one.)
Sunday, May 17, 2009
concentration of exchangeable processes
Having thoroughly embarrassed myself in the previous post, let me shoot for something half-way redeeming.
Let X_1, X_2, ... be a sequence of {0,1}-valued random variables. We say that the process {X_i} is exchangeable if every finite-dimensional joint distribution is invariant under the permutation of indices. The de Finetti theorem characterizes the exchangeable processes as mixtures (possibly uncountable) of Bernoulli processes. Note that this is only true for infinite sequences -- there are well-known examples of finite exchangeable sequences that cannot be extended even by 1 (see Diaconis for a fascinating discussion).
Now exchangeable processes are in general not concentrated. Indeed, the measure P on {0,1}^n with P(00...0)=P(11...1)=1/2 (i.e., a mixture of two degenerate Bernoulli processes) is exchangeable but not concentrated.
However, I claim the following:
1. if P is a mixture of Bernoulli processes whose biases are bounded within
0<=a<=b<=1, and f:{0,1}^n -> R is 1-Lipschitz w.r.t. the normalized Hamming metric,
we have
P(f - Ef > t) <= exp(-2n(t-d)^2)
where t < d =b-a
and the bias of a Bernoulli process is P(X)=1. Note that this recovers the ordinary McDiarmid bound, for which b=a, trivially.
2. if P only assigns positive probability to sequences with at most k ones,
we have
P(f - Ef > t) <= exp(-2n(t/2k)^2) ) for any f:{0,1}^n -> R that is 1-Lipschitz w.r.t. the normalized Hamming metric.
The proofs are standard and apparently not worthy of writing up -- but let me know if you try it and get stuck. Have these appeared in literature?
Let X_1, X_2, ... be a sequence of {0,1}-valued random variables. We say that the process {X_i} is exchangeable if every finite-dimensional joint distribution is invariant under the permutation of indices. The de Finetti theorem characterizes the exchangeable processes as mixtures (possibly uncountable) of Bernoulli processes. Note that this is only true for infinite sequences -- there are well-known examples of finite exchangeable sequences that cannot be extended even by 1 (see Diaconis for a fascinating discussion).
Now exchangeable processes are in general not concentrated. Indeed, the measure P on {0,1}^n with P(00...0)=P(11...1)=1/2 (i.e., a mixture of two degenerate Bernoulli processes) is exchangeable but not concentrated.
However, I claim the following:
1. if P is a mixture of Bernoulli processes whose biases are bounded within
0<=a<=b<=1, and f:{0,1}^n -> R is 1-Lipschitz w.r.t. the normalized Hamming metric,
we have
P(f - Ef > t) <= exp(-2n(t-d)^2)
where t < d =b-a
and the bias of a Bernoulli process is P(X)=1. Note that this recovers the ordinary McDiarmid bound, for which b=a, trivially.
2. if P only assigns positive probability to sequences with at most k ones,
we have
P(f - Ef > t) <= exp(-2n(t/2k)^2) ) for any f:{0,1}^n -> R that is 1-Lipschitz w.r.t. the normalized Hamming metric.
The proofs are standard and apparently not worthy of writing up -- but let me know if you try it and get stuck. Have these appeared in literature?
Wednesday, May 13, 2009
anti-concentration inequality?
Consider the following (at this point, conjectured) anti-concentration inequality:
Let X be a random variable with |X| <= 1, E[X] = 0, Var[X] = s^2.
Then for all t>0, I claim
P( |X| < st) <= s+t.
Is this true? Is this known?
Update: the condition |X|<=1 is unnecessary.
Update II: I don't know what I was smoking. Note to self -- NEVER post as you're running off to the gym. I figured out a simple counterexample while lifting, pretty much along the lines of Konstantin's.
Update III: Last attempt at embarrassing myself (today) -- really. How about the following.
|X| <= 1, E[X] = 0, Var[X] = v, 0 < t < 1.
Then:
P( |X| < tv) <= 1-tv/2.
Any takers?
Update IV: Mark Rudelson kindly replied to my email with a very simple proof:
for any t>0
v= E|X|^2 \le P(|X|>t) + t^2,
since |X| \le 1. Apply this to t=(v/2)^{1/2}.
I feel quite stupid for missing this, especially since it's the standard Markov/Chebyshev trick. An overall embarrassing post -- let's strike it from our collective conscience (but keep the bound, of course).
Let X be a random variable with |X| <= 1, E[X] = 0, Var[X] = s^2.
Then for all t>0, I claim
P( |X| < st) <= s+t.
Is this true? Is this known?
Update: the condition |X|<=1 is unnecessary.
Update II: I don't know what I was smoking. Note to self -- NEVER post as you're running off to the gym. I figured out a simple counterexample while lifting, pretty much along the lines of Konstantin's.
Update III: Last attempt at embarrassing myself (today) -- really. How about the following.
|X| <= 1, E[X] = 0, Var[X] = v, 0 < t < 1.
Then:
P( |X| < tv) <= 1-tv/2.
Any takers?
Update IV: Mark Rudelson kindly replied to my email with a very simple proof:
for any t>0
v= E|X|^2 \le P(|X|>t) + t^2,
since |X| \le 1. Apply this to t=(v/2)^{1/2}.
I feel quite stupid for missing this, especially since it's the standard Markov/Chebyshev trick. An overall embarrassing post -- let's strike it from our collective conscience (but keep the bound, of course).
Monday, May 4, 2009
Tighter Hamming ball volume?
Let C(n,k) be the binomial coefficient "n choose k". We want to bound the sum of these, with k ranging from 0 to d. This is equivalent to bounding the volume of the n-dimensional Hamming ball of radius d. Denote this number by B(n,d).
A well-known bound (appearing, for example, in the Sauer-Shelah-VC lemma) is
B(n,d) <= (en/d)^d .
We are pretty sure that it can be sharpened, for example by writing a (1/2) in front of the RHS. (We actually think it can be sharpened by better than a constant.)
Thus has to be known. Anyone have a reference? Thanks in advance!
A well-known bound (appearing, for example, in the Sauer-Shelah-VC lemma) is
B(n,d) <= (en/d)^d .
We are pretty sure that it can be sharpened, for example by writing a (1/2) in front of the RHS. (We actually think it can be sharpened by better than a constant.)
Thus has to be known. Anyone have a reference? Thanks in advance!
Friday, April 24, 2009
2 inequalities
1.
Let mu and nu be two Bernoulli measures on {0,1}, with biases p and q, respectively. Thus,
mu({1})=p=1-mu({0})
nu({1})=q=1-nu({0})
Let P and Q be the n-fold products of mu and nu, respectively. Thus, P and Q are two probability measures on {0,1}^n.
Show that
||P-Q|| <= n|p-q|
where ||.|| is the total variation norm (and <= is a poor man's way of writing \leq in plain text).
2.
Now assume that q = 1/2; thus Q is the uniform measure on {0,1}^n.
Show that
||P-Q|| <= C n^(1/2) |p-1/2|
where C>0 is some fixed universal constant. (Hint: Pinsker's inequality might come in handy.)
Are these novel or already known?
Let mu and nu be two Bernoulli measures on {0,1}, with biases p and q, respectively. Thus,
mu({1})=p=1-mu({0})
nu({1})=q=1-nu({0})
Let P and Q be the n-fold products of mu and nu, respectively. Thus, P and Q are two probability measures on {0,1}^n.
Show that
||P-Q|| <= n|p-q|
where ||.|| is the total variation norm (and <= is a poor man's way of writing \leq in plain text).
2.
Now assume that q = 1/2; thus Q is the uniform measure on {0,1}^n.
Show that
||P-Q|| <= C n^(1/2) |p-1/2|
where C>0 is some fixed universal constant. (Hint: Pinsker's inequality might come in handy.)
Are these novel or already known?
Thursday, December 18, 2008
Back by popular demand
My anxious readership has been flooding me with emails, demanding to know if I'm still alive and why I quit blogging. (Just kidding. Is anyone still reading this thing?)
A good chunk of my time has been occupied by administrative activity (job search), as well as personal matters (both good and bad).
I regularly attend two courses: one by Gideon Schechtman and one by Itai Benjamini.
Here is a nice "paradox" from Gideon's first lecture. Consider the 2x2 square = [-1,1]^2. In each quadrant, inscribe a unit circle. Let S_2 be the largest circle about the origin not intersecting any of the inscribed unit circles. Let R_2 be its radius; compute R_2 (it's easy!).
Now repeat the same in n dimensions: divide [-1,1]^n into 2^n orthants, inscribe a unit ball in each one, and let R_n be the radius of the maximal ball about the origin that does not intersect any of the inscribed balls. It shouldn't take you more than 2 minutes to come up with a formula for R_n -- the basic 2-dimensional intuition carries over to higher dimensions.
However, to anyone not familiar with high-dimensional geometric phenomena, something very surprising should happen. I won't give it a away here, but feel free to discuss in the comments. This sort of "paradox" probably has a name -- anyone know what it is?
A good chunk of my time has been occupied by administrative activity (job search), as well as personal matters (both good and bad).
I regularly attend two courses: one by Gideon Schechtman and one by Itai Benjamini.
Here is a nice "paradox" from Gideon's first lecture. Consider the 2x2 square = [-1,1]^2. In each quadrant, inscribe a unit circle. Let S_2 be the largest circle about the origin not intersecting any of the inscribed unit circles. Let R_2 be its radius; compute R_2 (it's easy!).
Now repeat the same in n dimensions: divide [-1,1]^n into 2^n orthants, inscribe a unit ball in each one, and let R_n be the radius of the maximal ball about the origin that does not intersect any of the inscribed balls. It shouldn't take you more than 2 minutes to come up with a formula for R_n -- the basic 2-dimensional intuition carries over to higher dimensions.
However, to anyone not familiar with high-dimensional geometric phenomena, something very surprising should happen. I won't give it a away here, but feel free to discuss in the comments. This sort of "paradox" probably has a name -- anyone know what it is?
Subscribe to:
Posts (Atom)