Let be a finite set and consider two probability measures (distributions) on , and . One can define many notions of "distance" between and , but a particularly important one (at least in my work) is the variational distance. By definition,
-- that is, we maximize over all subsets of the difference of the measures assigned by the two distributions. It is a well-known fact, routinely left as an exercize for the reader, that
where is just the norm of the difference , viewed as a vector in .
It is also well-known that
where the infimum is taken over all the distributions on , having marginals and , respectively, and the random variables are and are distributed and .
Let us define the (unnormalized) measure as the pointwise minimum of and . A somewhat lesser-known fact is that
(I have not seen this mentioned anywhere, but can't imagine that I'm the first one observing this). It easily follows that if and are minorized by some probability measure -- i.e., there is an for which holds pointwise, we have
Finally, here is a relation that has a chance of being novel. Consider two finite sets , with probability measures on and on . We will write for the product measure on (and similarly for ). Then
To the best of my knowledge, this "tensorization" result was first proved here, but I'd be very grateful if anyone would bring an earlier reference to my attention.
None of these are difficult to prove, but if pressed, how would you do it? I have a simple technique, based on a linear programming principle, that yields all of these pretty much effortlessly (see Lemma 2.6 in the linked paper for the idea). Any other simple proofs out there?