Chatbots as Translation

I got a translation program based on deep neural networks (http://opennmt.net/ if anyone wants to play along at home). I’m training it on a corpus of my previous text chats. The “source language” is everything that everyone has said to me, and the “target language” is what I said in return. The resulting model should end up translating from things that someone says to appropriate responses. My endgame is to hook it up to an instant messaging client, so people can chat with a bot that poses as me.

This has a couple of problems. The main one is that statistical translation generally works by converting the input language into some abstract form that represents a meaning (I’m handwaving so hard here that I’m about to fly away) and then re-representing that meaning in the output language. Essentially, the overall concept is that there is some mathematical function that converts a string in the input language to a string in the output language while preserving meaning, and that function is what is learned by the neural network. Since what someone says and what I respond with have different, but related meanings, this isn’t really a translation problem.

The other problem comes up when I do the second part of this plan, which is to train the network with a question and answer corpus. At its most abstract, a question is a request to provide knowledge which is absent, and an answer is an expression of the desired knowledge. “Knowledge”, in this case, refers to factual data. One could attempt an argument that by training on a Q&A corpus, the network is encoding the knowledge within itself, as the weights used in the transformation function. As a result, the network “knows” at least the things that it has been trained on. This “knowing” is very different from the subjective experience of knowing that humans have, but given the possibility that consciousness and subjective experience may very well be an epiphenomenon, maybe it has some similarities.

Unfortunately, this starts to fall apart when the deep network attempts to generalize. Generalization, in this case, is producing outputs in response to inputs that are not part of the training input data set. If one trains a neural network for a simple temperature control task, where the input is a temperature, and the output is how far to open a coolant valve, the training data might look like this:

Temperature Valve Position
0 0 (totally closed)
10 0.1
20 0.2
30 0.3
40 0.4
50 0.5 (half open)
60 0.6
70 0.7
80 0.8
90 0.9
100 1.0 (fully open)

So far, so good. This is a pretty simple function to learn, the valve position is 0.01 * Temperature. The generalization comes in when the system is presented with a temperature that isn’t in the training data set, like 43.67 degrees, which one would hope results in a valve setting of 0.4367 or thereabouts. There is a problem that temperatures less than zero or greater than 100 degrees result in asking the valve to be more than completely shut, or more than fully open, but we can just assume that the valve has end stops and just doesn’t do that, rather than trying to automatically add a second valve and open that too.

The problem comes when we start generalizing across questions and answers. Assume there is some question in the training data that asks “My husband did this or that awful thing, should I leave him?” and the answer is mostly along the lines of “Yes, bail on that loser!”, and another question that asks “My husband did some annoying but not really awful thing, should I leave him?” and the answer is “No, concentrate on the good in your relationship and talk to him to work through it.” These are reasonable things to ask, and reasonable responses. Now imagine that there is a new question. The deep network does its input mapping to the space of questions, and the result (handwaved down to a single value for explanation purposes) falls somewhere between the representations for the “awful thing” question and the “annoying thing” question. Clearly, the result should fall somewhere between “DTMFA” and “Stick together”, but “Hang out near him” isn’t really good advice and “Split custody of the kids and pets, but keep living together” seems like bizzaro-world nonsense. There isn’t really a mathematical mapping for the midrange here. Humans have knowledge about how human relationships work, and models of how people act in them, that we use to reason about relationships and offer advice. This kind of knowing is not something deep networks do (and it’s not even something that anyone is trying to claim that they do), so I expect that there will be a lot of hilarious failures in this range.

Ultimately, this is what I’m hoping for. I’m doing this for the entertainment value of having something that offers answers to questions, but doesn’t really have any idea what you’re asking for or about, and so comes up with sequences of words that seem statistically related to it. We (humans) ascribe meaning to words. The deep network doesn’t. It performs math on representations of sequences of bytes. That the sequences have meaning to us doesn’t even enter into the calculations. As a result, its output has flaws that our knowledge allows us to perceive and reflect on.

Plus, I’m planning to get my Q&A corpus from Yahoo Answers, so not only will the results be indicative of a lack of knowing (in the human sense), they’ll also be amazingly low quality and poorly spelled.