Tuesday, July 24, 2007

A Chernoff paradox?

The multiplicative Chernoff bound states that if X is the average of n independent, 0-1 random variables X_i, each with mean p, then

P(|X-p| > tp) < 2exp(-npt^2 / 3).

In other words, the empirical average of independent binary random variables is exponentially concentrated about their true mean. The probability that we're off by a muliplicative factor of t is subgaussian in t. There is a catch, however: rare events are difficult to estimate. Thus, if p is tiny, the bound becomes quite loose. Indeed, if you're flipping a coin whose probability of heads is 10^-6, you can easily see a thousand flips without a single head.

But what about the following "trick" to beat Chernoff? Since X_i=1 is a rare event, let us define Y_i = 1-X_i. Now Y, the average of Y_1,...,Y_n, has expected value p'=1-p, which is large if p is small. So instead of obtaining a loose bound for X in terms of p, we obtain a good bound for Y in terms of p' -- yet X and Y are related in a very simple way. Does this trick really work or have I cheated somewhere?

2 comments:

Anonymous said...

$tp'$ is much larger than $tp$. So the bound that you get is weaker in this sense.

Aryeh said...

RD -- yes, that's certainly true. Since my question was loosely posed, there is no one right answer. The way I'd phrase it is that we want to bound the ratio of the random variable to its mean.

Let's take the example of the coin whose probability of heads is 10^-6. We flip it 1000 times and observe no heads. So we deviate from the mean by a ratio of
|0-p|/p = 1, which is huge. If we were looking at tails instead of heads, however, we'd observe 1000 tails. In this case,
|1-p'|/p' ~ p is tiny, but we're computing an entirely different ratio!

The above observation is vacuous and trivial, and to make a non-trivial point I have no choice but to point the readers to my universal kernel paper. There, in Sec. 5, I compute a certain quantity via stochastic simulation and use the multiplicative Chernoff bound to argue that the calculation is efficient. If I tried to use the above "trick", I'd get a totally useless bound. Do people see why?