Forget global warming and intelligent design. The real debate is whether or not (strong) AI is possible. Now this isn't much of a debate for myself or most people I encounter in my academic circles. We read Dennett and Hofstadter while still in diapers, and use "dualist" as some sort of slur. Of course, occasional dissent does creep in. At a recent openhouse for newly admitted PhD students, I was talking to several colleagues and was surprised when one -- a respected machine learning theorist -- was willing to stick out his neck against AI. Without quite committing to dualism, his argument (if I recall correctly) was along the lines of Penrose's; if I may paraphrase, consciousness is just too freaky to be explained by classical mechanics, and so must be swept under the quantum gravity rug. My friend Gahl (I surmise from our conversation) is basically being indoctrinated to reject strong AI in her freshman class.

So, I thought I'd use this space to give a strong AI opponent (or several) an opportunity to defend their views. Tell us why machines will never think or be conscious. Is it the missing ghost? Something magical about neural tissue? A mis-application of Godel's theorem?

Have I caricatured your stance? Here's your chance to set the record straight!

## Saturday, March 31, 2007

## Monday, March 26, 2007

### Is math tautological?

Scott throws the following teaser:

Now I can make a meaningful distinction between invention and discovery. We discover elements of E, but invent elements of B. We discover mathematical truths, but invent proof techniques. The game of chess belongs squarely in B, and thus is an invention.

And what bearing does this have on Scott's comment? Well, E consists of self-contained truths, or tautologies. We can only access a tautology via a proof. The heart of math isn't making true statements, it's finding clever proofs!

One example is the surprisingly common view that "all mathematical propositions are tautologies," and therefore can’t convey any new informationand of course I can't help but take the bait. As you surely recall from this discussion, I'm a firm Platonist:

Pythagoras's theorem is a statement about objects that have no width, mass, or time duration. It is not a statement about depressions in sand, sticks, or strings. [...] The fact that in a right triangle, the sum of the squares of the legs equals the square of the hypotenuse was true long before Pythagoras or even planet Earth was around; that it was discovered by some humans (long before Pythagoras, actually) has no bearing on its validity.However, I had also dug myself into a bit of a hole:

Yes, the boundary between "discovery" and "invention" is indeed blurry; I am not sure I can give a meaningful answer to whether chess was invented or discovered.And now, thanks to Scott, I think I can dig myself out of that hole. We are going to define two realms: E (for Euclid) and B (for Borges). E contains all the mathematical "tautologies". Thus, if you seed E with the definition of a group, E will also contain all the facts about groups, including the theorems we've discovered, ones we've yet to discover, ones we'll never discover, and ones that are true but unprovable. B is a much more boring set -- it is the collection of all possible statements, true and false, about anything. It includes a statement and proof of Pythagoras's theorem (and its negation with a false proof), a description of the game of chess (and its infinite variations), as well as lots of pure gibberish.

Now I can make a meaningful distinction between invention and discovery. We discover elements of E, but invent elements of B. We discover mathematical truths, but invent proof techniques. The game of chess belongs squarely in B, and thus is an invention.

And what bearing does this have on Scott's comment? Well, E consists of self-contained truths, or tautologies. We can only access a tautology via a proof. The heart of math isn't making true statements, it's finding clever proofs!

### Paul Cohen ז"ל + language, human and otherwise

The discussion following Scott's post on the passing of Paul Cohen has segued into learnability of automata and languages, a topic I take a keen interest in. I've left some comments. Check it out!

## Thursday, March 22, 2007

### Geometry question resolved + debriefing

First, I should note that the margin question came up in my work with Corinna Cortes and Mehryar Mohri; see our ALT'06 and (to appear) COLT'07 papers.

Next, a shameful admission: I seem to have forgotten how logical contrapositives work. As Avrim reminds me, the perceptron algorithm will terminate in O(1/gamma^2) steps if the data are separable by a margin of at least gamma. Let me quote Avrim:

Next, a shameful admission: I seem to have forgotten how logical contrapositives work. As Avrim reminds me, the perceptron algorithm will terminate in O(1/gamma^2) steps if the data are separable by a margin of at least gamma. Let me quote Avrim:

what you show is that any separator must have w_n that is exponential in w_1, and that furthermore, flipping the sign ofw_1 makes the separator inconsistent. That implies that for any consistent separator, there is an exponentially small change in its angle that makes it inconsistent, which means it has an exponentially small margin.Finally, a bit of philosophy of math. One should resist the temptation to run numerical simulations, taking the time to think things through with pen and paper first.

## Wednesday, March 21, 2007

### Progresss on the margin problem

Avrim Blum alerts me to something I should've remembered from his class -- one can force a perceptron to make exponentially many mistakes (i.e., achieve an exponentially small margin) with a properly concocted decision list. In fact, digging up my solution to his '04 final exam problem, here is such an example. Assuming my readers think in MATLAB (much as Scott thinks in BASIC), consider the following function:

function [X,Y] = badmargin(n)

X = eye(n);

for k=2:2:n

X(1:1+n-k,k:n) = X(1:1+n-k,k:n) + eye(n-k+1);

end

X = [[X;zeros(1,n)] ones(n+1,1)];

Y = (-ones(n+1,1)).^([2:n+2]');

return

Here's the output [X Y] for n=9:

1 1 0 1 0 1 0 1 1 1

0 1 1 0 1 0 1 0 1 -1

0 0 1 1 0 1 0 1 1 1

0 0 0 1 1 0 1 0 1 -1

0 0 0 0 1 1 0 1 1 1

0 0 0 0 0 1 1 0 1 -1

0 0 0 0 0 0 1 1 1 1

0 0 0 0 0 0 0 1 1 -1

0 0 0 0 0 0 0 0 1 1

(the vectors x in {0,1}^9 are treated as rows, and the labels y = +/-1 are appended at the end).

Using badmargin(n), for n=2 to 20, to generate labeled data sets and running SVM on these, we get the following plot:

which is highly suggestive of exponential decay. It remains to actually prove that this explicit construction achieves exponentially small max-margin. Any takers?

function [X,Y] = badmargin(n)

X = eye(n);

for k=2:2:n

X(1:1+n-k,k:n) = X(1:1+n-k,k:n) + eye(n-k+1);

end

X = [[X;zeros(1,n)] ones(n+1,1)];

Y = (-ones(n+1,1)).^([2:n+2]');

return

Here's the output [X Y] for n=9:

1 1 0 1 0 1 0 1 1 1

0 1 1 0 1 0 1 0 1 -1

0 0 1 1 0 1 0 1 1 1

0 0 0 1 1 0 1 0 1 -1

0 0 0 0 1 1 0 1 1 1

0 0 0 0 0 1 1 0 1 -1

0 0 0 0 0 0 1 1 1 1

0 0 0 0 0 0 0 1 1 -1

0 0 0 0 0 0 0 0 1 1

(the vectors x in {0,1}^9 are treated as rows, and the labels y = +/-1 are appended at the end).

Using badmargin(n), for n=2 to 20, to generate labeled data sets and running SVM on these, we get the following plot:

which is highly suggestive of exponential decay. It remains to actually prove that this explicit construction achieves exponentially small max-margin. Any takers?

## Tuesday, March 20, 2007

### Elementary Maurey's theorem + ode to interaction

Maurey's theorem (rather, one of them) states that if P is the Haar (i.e., normalized counting) measure on S_n (the group of all permutations on n elements) and d is the normalized Hamming metric on this group, then

In our conversation at Berkeley, Jim Pitman suggested a simple way to parametrize uniformly random permutations as an independent process. At each step i, 1<=i<=n, you uniformly pick an integer between 1 and i as the location in which to insert the next element. Thus we can recover Maurey's theorem in a simpler way, and with better constants, via McDiarmid's inequality:

The greater moral of this story is that I should talk to people more. I can't overstate the value of this blog as a research tool -- look no further than the high quality discussions here, here, here, and here (and this isn't counting the email exchanges that some posts generate). But you can't count on the experts to always stumble on your blog post -- nothing beats talking to one in person!

P(f-Ef > t) <= exp(-nt^2 / 8),

where f:S_n -> R is 1-Lipschitz w.r.t. d. See Schechtman's paper for a proof; it is done via the notion of metric space length.In our conversation at Berkeley, Jim Pitman suggested a simple way to parametrize uniformly random permutations as an independent process. At each step i, 1<=i<=n, you uniformly pick an integer between 1 and i as the location in which to insert the next element. Thus we can recover Maurey's theorem in a simpler way, and with better constants, via McDiarmid's inequality:

P(f-Ef > t) <= exp(-2nt^2),

where P and f are as above. At first I couldn't believe it would be this simple, but Gideon tells me that I'm right.The greater moral of this story is that I should talk to people more. I can't overstate the value of this blog as a research tool -- look no further than the high quality discussions here, here, here, and here (and this isn't counting the email exchanges that some posts generate). But you can't count on the experts to always stumble on your blog post -- nothing beats talking to one in person!

## Monday, March 12, 2007

### Miscellanea

## Tuesday, March 6, 2007

### Geometry of DFAs

As before, define to be the set of all deterministic finite-state automata (DFAs) on states over some fixed alphabet . As usual, we require that every DFA start in state 1, meaning that . Define the following metric on : for , let be their symmetric-difference DFA -- meaning that accepts precisely the strings that are either in or in but not in both. Constructing such an automaton is easy, as is minimizing it. So, assuming is in minimized form, define , where is the number of states in a DFA. It is straightforward to verify that is a valid (pseudo-)metric on .

This notion came up in my conversation with Jason Franklin, who was looking for a natural metric on automata, for security-related reasons. I suggested the metric above, and I'm wondering if it's appeared anywhere in literature. It's a natural measure of the complexity of the language on which two DFAs disagree.

Any finite metric space has a well-defined

The length of is defined to be the infimum over all -partition sequences.

Bounding the length of has immediate applications to -concentration under the normalized counting measure on , as explained here.

I am interested in computing (really, upper-bounding) the length of . Any takers?

This notion came up in my conversation with Jason Franklin, who was looking for a natural metric on automata, for security-related reasons. I suggested the metric above, and I'm wondering if it's appeared anywhere in literature. It's a natural measure of the complexity of the language on which two DFAs disagree.

Any finite metric space has a well-defined

*length*; the definition may be found in Schechtman's paper. Let be any finite metric space. For , define an -partition sequence of to be a sequenceof partitions of such that refines and whenever and, there is a bijection such that for all .

The length of is defined to be the infimum over all -partition sequences.

Bounding the length of has immediate applications to -concentration under the normalized counting measure on , as explained here.

I am interested in computing (really, upper-bounding) the length of . Any takers?

## Friday, March 2, 2007

### Geometry question

Consider the binary hypercube B_d = {0,1}^d naturally embedded in R^d. I take some hyperplane H and slice B_d into two pieces, labeling the corners on one side of H positive and on the other side negative. Then I forget about H and find the maximum-margin hyperplane W that separates the positive corners from the negative ones. Define rho(d) to be the worst-case (i.e., smallest) margin obtainable in this way for dimension d. Formally,

rho(d) =

min_{all hyperplanes H}

max_{all hyperplanes W that induce the same dichotomy as H}

(the margin attained by W).

How does rho(d) behave? I recall some numerical simulations a couple of months ago suggesting that it can be exponentially small, on the order of 2^{-d}. Anybody out there aware of any known results? This would be

rho(d) =

min_{all hyperplanes H}

max_{all hyperplanes W that induce the same dichotomy as H}

(the margin attained by W).

How does rho(d) behave? I recall some numerical simulations a couple of months ago suggesting that it can be exponentially small, on the order of 2^{-d}. Anybody out there aware of any known results? This would be

**really good**to know!## Thursday, March 1, 2007

### Another crack at old Georg

Just as our faith in Cantor grew in the last post, doubt is once again starting to creep in. I'm talking about the well-ordering principle, which says, unsurprisingly enough, that any set can be well-ordered. What this means for the reals is that there is a total ordering relation (which most emphatically is

a <' b <' c <' ...

and every real number r will eventually appear in this chain.

Now the $1.64 question is: why can't we apply Cantor's diagonal argument to this well-ordering to construct a number that doesn't appear in the chain?

**not**the usual "less than" relation on R) -- let's denote it by x <' y -- that well-orders the reals. This relation induces the orderinga <' b <' c <' ...

and every real number r will eventually appear in this chain.

Now the $1.64 question is: why can't we apply Cantor's diagonal argument to this well-ordering to construct a number that doesn't appear in the chain?

### Go home, Cantor!

Cantor's proof of the uncountability of the reals goes like this. Every real number in (0,1) has a unique nonterminating binary expansion [every word there is important; nonterminating means not ending in all zeros]. You show that the members if (0,1) are not countable by contradiction. If there were an exhaustive enumeration (i.e., 1-to-1 mapping between (0,1) and the naturals), we could always construct a new number by flipping the first bit of the first one, 2nd bit of the 2nd, one, and so on. Anyone seeing this for the first time should go look it up.

I have a student who insists he's broken Cantor's argument. Take the following list, he says:

.01111111....

.01011111....

.00111111....

.00011111....

.00001111....

.00000111...

and so on. If you construct a number by flipping the bits on the diagonal, you get

.1000000... = .0111111...

which is the first number on the list. Can someone explain why Cantor's theorem still holds? Ideally, the explanation would be provided by the student himself. What do you say, "D"? I gave you a hint after class...

I have a student who insists he's broken Cantor's argument. Take the following list, he says:

.01111111....

.01011111....

.00111111....

.00011111....

.00001111....

.00000111...

and so on. If you construct a number by flipping the bits on the diagonal, you get

.1000000... = .0111111...

which is the first number on the list. Can someone explain why Cantor's theorem still holds? Ideally, the explanation would be provided by the student himself. What do you say, "D"? I gave you a hint after class...

Subscribe to:
Posts (Atom)