Tuesday, March 6, 2007

Geometry of DFAs

As before, define to be the set of all deterministic finite-state automata (DFAs) on states over some fixed alphabet . As usual, we require that every DFA start in state 1, meaning that . Define the following metric on : for , let be their symmetric-difference DFA -- meaning that accepts precisely the strings that are either in or in but not in both. Constructing such an automaton is easy, as is minimizing it. So, assuming is in minimized form, define , where is the number of states in a DFA. It is straightforward to verify that is a valid (pseudo-)metric on .

This notion came up in my conversation with Jason Franklin, who was looking for a natural metric on automata, for security-related reasons. I suggested the metric above, and I'm wondering if it's appeared anywhere in literature. It's a natural measure of the complexity of the language on which two DFAs disagree.

Any finite metric space has a well-defined length; the definition may be found in Schechtman's paper. Let be any finite metric space. For , define an -partition sequence of to be a sequence

of partitions of such that refines and whenever and, there is a bijection such that for all .
The length of is defined to be the infimum over all -partition sequences.

Bounding the length of has immediate applications to -concentration under the normalized counting measure on , as explained here.

I am interested in computing (really, upper-bounding) the length of . Any takers?

1 comment:

Aryeh said...

Note that length is distinct from diameter; the latter upper-bounds the former. An obvious bound on the diameter of DFA(n) is n^2-1 and I suspect it's tight. To get anything useful, we'd need the length of the metric space to be significantly smaller than the diameter (say,
len/diam is o(1/sqrt(n))