The semester is over, the grades should be in by now, and most of my students from FLAC are graduating. One of the components of the course was a research project, during which the instructors mentor the students on an individual basis. This has been quite a rewarding experience, since I managed to get four of my students interested in some deep and fascinating problems in automata theory, with connections to my own work. I note that all the students I mentored did a fine job and a couple of them taught me new things. But in this post, I'd like to showcase the projects with the strongest connection to automata theory and machine learning.
Jeremiah Blocki is that brave soul who took on the notorious problem #3. First, here is that long-overdue writeup where I define the universal regular kernel. In his paper, Jeremiah gives closed-form expressions for K_n(x,y) for short x and y, as well as proving some simple lower bounds on the complexity of K_n.
Vinay Chaudhary became interested in my work with Corinna Cortes and Mehryar Mohri on learning piecewise testable languages with the subsequence kernel. Aside from understanding our somewhat involved word-algebraic proof of linear separability, Vinay had to learn a whole bunch of subtle machine-learning notions, essentially on his own. Having gained a command of margins, support vectors, embeddings and kernels, he embarked on an investigation of the empirical margin of piecewise testable languages. Vinay produced a excellent piece of research, with some tantalizing leads.
Matthew Danish considered a problem that I'd attempted many moons ago and put aside -- namely, one of characterizing the languages that one obtains by taking finite Boolean combinations of modular n-grams. (A modular n-gram is the set of all strings in which a given contiguous substring occurs k mod n times.) Matt also had to master abstruse concepts such as syntactic monoids and morphisms, and produced a solid paper.
Jonah Sherman decided to aim high and attempt the star-height problem. I remember how mesmerised I was by this problem when I first encountered it some four years ago. When Jonah had asked me about its importance, I replied that if we can't answer such natural questions about regular languages then we are in dire need of better tools! That was good enough for him, and he dove into some rather dense semigroup theory, even rediscovering the rather nontrivial result that all languages recognized by finite commutative monoids have star height of at most one. Impressive work done by a college junior, check it out!