This is prompted by the Weekly Standard piece on Samuel Johnson. The author, Jack Lynch, credits Johnson (among other things) for taking the trouble to define such "obvious words" as take and get -- which lexicographers before him had neglected. The segue into philosophy of language is inevitable: do words have "primitive", "irreducible" definitions? Or are dictionary entries necessarily circular, since we can only define words in terms of other words? Since much ink has already been spilled on these matters (cf. Wittgenstein, Chomsky and David Lewis), let us frame the question more operationally. Can a computer program, in principle, reason about the world (on par with humans) given only text as input and output (for learning)? Or must it necessarily have sensory perception in order to become sentient?
My training at Bell Labs, where I was most heavily influenced by Daniel Lee, leads me to conjecture that a large enough corpus of text, with sufficiently clever statistical processing, is enough to train a machine that would pass the Turing test. Supporting evidence: congenitally blind people are able to reason about colors perfectly well, without having any qualitative experience of the phenomenon. The linguist J.R. Firth would seem to agree with the possibility of "stand-alone" semantics: "You shall know a word by the company it keeps".
Taking five minutes to ponder the mysteries of language reminds me why I left Natural Language Processing, after a few brief forays. The problem is just too hard, and our current tools too primitive. I realized that until we develop more powerful mathematical tools, NLP research will be plagued by the sad fact that ad-hoc heuristic hacks tend to outperform elegant, clean, principled models. Of course, I am quite out of it as far as recent developments -- and would be happy if someone would set me straight. I know that elegant, principled models exist for document classification and text translation. I also know that if the state of the art is to be judged by Google's automatic translator, then there is, ahem, much room for improvement.
I prefer to be a producer of formal theorems and a consumer of NLP products (and judging by my list of rejected NLP paper submissions, my preference is in line with that of the community). Of course, no one can stop me from dabbling in language as a hobby, which I regularly do. Want the etymology of an obscure (but known!) Indo-European or Semitic root? Want a tip picking out a good dictionary? You've come to the right place. Regarding the latter: I can size up a dictionary in a matter of minutes, and my intuition has yet to mislead me. Always look up slang and, yes, vulgarities -- any lexicographer who pretends that certain words don't exist isn't worthy of the title. (Russian joke: "Мама, что такое жопа?" -- "Такого слова нет, сынок." -- "Странно -- жопа есть, а слова нет?..") Our final conclusion is that Russians have a joke for most occasions.