Month: December 2011
I’m not a novelist. I can write, and with sufficient editing, I can even write passable English text, but it’s not really something that I’ve put a lot of effort into, and so not something that I’m good at. My real time investment in writing has been writing software, which is read only by compilers and interpreters, and so what it lacks in excitement, narrative, and plausible dialog, it makes up for in conciseness and precision.
My ideas for National Novel Writing Month (November, for those as don’t know) then, are coding a program that writes novels, and coding a program that takes novels as inputs and generates interactive fiction (text adventure games) from them. Both of these are problems with infinite hair, and are arguably AI-Hard problems (that is, problems whose solution is on par, difficulty-wise, with creating a human-level general-purpose AI). On the other hand, writing a good novel is probably also quite hard.
I think the most reasonable approach for the novel generator would be a recursively-defined novel description language, which selects from tropes and plot stubs, generates characters, and so forth, based on relatively simple rules. The complexity would come from applying the rules over and over, so a simple quest to throw a ring into a volcano grows branches on branches on branches until it is One Damn Thing After Another Until All The Orcs Are Dead. The goal of the program would be to use generative content and emergent behavior to do most of the writing, and leave me to fill out the turns of phrase and details (or generate them a la Dwarf Fortress, which menaces with spikes of ivory). Done badly, this would read like a Mad Lib. Done well, it would read like a Mad Lib filled out by people who don’t say “dongs” every time they are asked for a noun.
Making interactive fiction (IF) out of novels would be substantially harder. The novel parser would have to read English, which is actually quite a trick. English has multiple words for one meaning and multiple meanings for one word, highly flexible structure, and counts on the reader to sort it all out. On top of that, most of the awesome tricks one can pull in English are more a matter of exploiting shared cultural context with your reader than they are particular sequences of words. If I wrote such a parser in one month, or at all, a lot of linguistics researchers would be out of work.
Assume I went for one tiny part of the problem: identifying the locations in the novel. The same place might be described as “where that party was”, “Joe’s house”, “the darkened house”, and “a pit of iniquity”. Only the events of the novel link them, and so the program would have to determine that these totally different words referred to the same place. The best approximation I could likely come up with is identifying all the things in the novel that sound like places, and then performing some sort of clustering based on what words are mentioned close to mentions of those places. This would likely lead to a bunch of spurious places getting generated, and real places getting overlooked. There is an entire company, called Metacarta, that did this sort of analysis on much more constrained data sets, and even then it was a difficult problem for a team of people who were likely smarter than me.
However, doing a good job of adapting novels to interactive fiction might not be the best approach. It might be better to get a rough cut of the software together to do anything at all, and gradually improve it until it writes things that are playable curios, rather than detailed simulations of well-loved novels. It wouldn’t be a matter of playing through “A Game of Thrones” so much as it would be “poking around in a demented dreamscape based loosely on ‘A Game of Thrones'”.
This is actually related to another idea that I had, which is sort of a rails shooter based on the consequences of shooting things. You play through the game, riding the rails and shooting enemies of varying levels of craftiness and menace. When you reach the end of the level, you just loop through it again, passing the bodies of everything you killed, and getting another shot at everything you missed. This repeats until you have killed everything in the game world, whereupon you continue to loop, passing through scenes of slaughter as the heroic music fades and is replaced with silence and the buzzing of flies. Perhaps, if you let it run long enough, the dead bodies would rot to skeletons. Now, of course, I’ve spoiled it for you, but I’ll probably never get around to writing it, so at least you’ve had the idea.
I finished writing the chatbot that I was working on. It consists of a set of scripts to prepare the data, and another script that listens for incoming messages and responds. You can get the code and an overview of how it works here.
Obviously, I’m not publishing my chat logs. Use your own. It is designed to work with Pidgin’s HTML-like format for chat logs, but it could be modified to work on almost any corpus. I really should clean up things like the string cleaning routines, but it worked for class, and that’s what actually matters.