As
before, define

to be the set of all deterministic finite-state automata (DFAs) on

states over some fixed alphabet

. As usual, we require that every DFA start in state 1, meaning that

. Define the following metric on

: for

, let

be their symmetric-difference DFA -- meaning that

accepts precisely the strings that are either in

or in

but not in both. Constructing such an automaton is easy, as is minimizing it. So, assuming

is in minimized form, define

, where

is the number of states in a DFA. It is straightforward to verify that

is a valid (pseudo-)metric on

.
This notion came up in my conversation with
Jason Franklin, who was looking for a natural metric on automata, for security-related reasons. I suggested the metric above, and I'm wondering if it's appeared anywhere in literature. It's a natural measure of the complexity of the language on which two DFAs disagree.
Any finite metric space has a well-defined
length; the definition may be found in Schechtman's
paper. Let

be any finite metric space. For

, define an

-partition sequence of

to be a sequence

of partitions of

such that

refines

and whenever

and

, there is a bijection

such that

for all

.
The length

of

is defined to be the infimum over all

-partition sequences.
Bounding the length of

has immediate applications to

-concentration under the normalized counting measure on

, as explained
here.
I am interested in computing (really, upper-bounding) the length of

. Any takers?
1 comment:
Note that length is distinct from diameter; the latter upper-bounds the former. An obvious bound on the diameter of DFA(n) is n^2-1 and I suspect it's tight. To get anything useful, we'd need the length of the metric space to be significantly smaller than the diameter (say,
len/diam is o(1/sqrt(n))
).
Post a Comment