Sunday, February 4, 2007

Concentration of Boolean functions?

In the course of giving an invited talk at Mehryar Mohri's NYU machine learning class this fall, I gave the example of the parity function f:{-1,1}^n -> {-1,1} to illustrate how the concentration phenomenon can break down without a Lipschitz condition. The natural metric here is normalized Hamming and it's immediate that the parity function is not Lipschitz with respect to this metric (or rather its Lipschitz constant is 2n). On the other hand, under the uniform distribution on the discrete cube {-1,1}^n, we have
P(f=-1)=P(f=1)=1/2, while E[f]=0, which means that f is not concentrated about its mean (or any other constant).

Someone in the audience (unfortunately, I don't remember who) asked about the majority function. Totally in improvisation mode, I blurted out that perhaps the majority function is "almost Lipschitz" and one might be able to apply Kutin's generalization of McDiarmid's inequality. [Regarding the latter, I'm surprised this fine paper isn't getting more attention or seeing good applications. Or is it?]

Fast forward to this spring, when I'm sitting in on Ryan O'Donnell's very enjoyable class. We're seeing all sorts of characterizations of properties of Boolean functions in terms of their Fourier coefficients and related concepts. Given my one-track mind, I keep wondering if one can say something about the concentration of Boolean functions with low energy or degree.

Having thought about it for a bit, I see my hopes were naive. Take the simplest nontrivial function -- a dictator (that is, f:{-1,1}^n -> {-1,1} whose values depend only on a fixed single bit). As shown in homework 1, dictators have a rather simple Fourier decomposition. However, for a dictator, we have P(f=-1)=P(f=1)=1/2 and E[f]=0, so we can forget about concentration.

What about majority? The majority function has a more complex Fourier structure (could this be a future homework problem?). Some quick numerical trials indicate that when f:{-1,1}^n -> {-1,1} is the majority function, we have E[f] tending to 0 and P(f=-1), P(f=1) tending to 1/2. I haven't proved this (another possible exercise?), but if it's true then I hope whoever asked me that question reads this post!

A couple of afterthoughts. (1) A recent inequality of Kim and Vu allows one to prove concentration of polynomials with nonnegative coefficients. Since the Fourier expansion is a multivariate polynomial, one might be able to apply Kim-Vu to certain classes of Boolean functions. Is there a characterization of the Boolean functions having nonnegative Fourier coefficients?

(2) I was at first excited to see the Poincaré inequalities, as they are one method for proving concentration. But one must be careful, as Poincaré and log-Sobolev inequalities are really asserting a property of the probability measure, not any specific function. In particular, the Poincaré inequality on the discrete cube is telling us that the product measure on {-1,1}^n has exponential concentration with respect to the Hamming metric. But we already knew a stronger fact (subgaussian tails) via Chernoff bounds!

Update Feb 6: I feel a bit silly running numerics on the majority function and suggesting the observation that E[f] tends to 0 and P(f=-1), P(f=1) tend to 1/2 as an exercise. It's obvious.
Update II: An obvious point that I forgot to make is that it's a bit vacuous to talk about concentration of {-1,1} valued functions; this only makes sense if f_n:{-1,1}^n -> {-1,1} is approaching a constant. The original concentration question was actually about real-valued functions on the cube, but the simple reasoning above show that small Fourier coefficients do not imply concentration.