Category: Artificial Intelligence
The Cocktail Embedding
I got into cocktail making during the pandemic lockdowns. I have a little library of cocktail books, pretty heavy on the Death and Company books, and assorted Tiki nonsense. Once you make a bunch of cocktails, you start to get an idea of what sort of structure they have, and so where you can swap things out or make substitutions. For instance, if there’s a sour/acid component like lemon juice, there’s likely to be an approximately equal volume of sweetener like orgeat or simple syrup. Another thing is that there are common backbones with embellishments on them to make different drinks. A Margarita is tequila, triple sec, and lime juice. If you don’t have tequila, use cognac and you have a Sidecar. A Daquiri is pretty similar, but with rum as the spirit and simple syrup as the sweetener. Swap whiskey for the rum and you have a Whiskey sour (although probably with lemon, not lime). Swap gin for the rum, and it’s a Gimlet. Put some mint and Angostura bitters in there, and you’re drinking a Southside (and it’s way better).
At any rate, there are certain regularities in the structure of drinks. Part of this is due to what’s palatable, part of it is probably fashion, and there is also history and economics in there. Saint Germain is lovely stuff, but it’s also relatively modern, so you won’t find it in old cocktail books. There also aren’t a ton of scotch cocktails because it has historically been viewed as A) not for mixing (aside from a few drops of Scottish spring water, maybe), and B) expensive. Not putting expensive ingredients in cocktails is for weirdos, the stuff was made to be drank, maybe drink it ya goober.
However, none of this historical or cultural context is able to be rolled into questionable machine learning plans and schemes, so I’ve recently been pillaging cocktail recipe web sites for their recipes using web spiders. I have thousands of recipes, and I’m planning to use it as a data set to train ML models to do a couple of different things.
One basic thing is clustering. Cocktails can be imagined as points in a high-dimensional space. The number of dimensions is quite large, because each ingredient is a dimension, and the amount of each ingredient is the distance from the origin in that dimension. One thing that I’m probably going to have to do in the interests of comparisons is normalizing the units of the cocktails, since some are in ounces, some are in milliliters, and some of the ingredients are in amounts like “barspoons” and “dashes”. That transformation will put them all in the same coordinate space. Normalizing them so that the ingredients are proportional to the total drink then converts them to points within the unit hypersphere. Because Cartesian distance is pretty ill-behaved in high-dimensions, cosine similarity will probably be a better measure for clustering, but it will be interesting to see what comes out from that process.
Before I can do any of that, though, I have to clean up the data, and that’s turning out to be its own interesting problem. One aspect of it is that liquors are themselves kind of complicated. Gin, for example, is pretty variable in taste. Hendricks Gin is fine stuff, but if you don’t want your drink to taste like cucumber, don’t use it. Most gin has piney/citrus notes, but the other herbs and spices in it cause a lot of variation around that theme. Some cocktail recipes (specs, in the industry) call for a specific spirit. Some web sites complicate this by allowing you to enter your bar’s contents, and then swapping analogous spirits out for the ones you already have. This works for some spirits, sort of, but it has limits.
As a result, one of the more complicated parts of normalizing the data set is going to be figuring out where the boundaries are between types of alcohol that can be substituted for each other, and types that can’t. This is a form of dimensionality reduction, because collapsing all of, for example, the different brands of gin into a single “gin dimension” will remove dimensions from the dataset. More importantly, it removes dimensions that are effectively the same dimension, but would be extremely sparse, with each brand probably only represented a handful of times.
The problem is, not all spirits in a class are analogous. Kentucky bourbon and Islay scotch are exactly the same thing, in the sense that they are fermented from a grain mash, distilled to somewhere around 80% alcohol, and then barrel aged. Say that in Scotland or Kentucky, though, and you best start running. Bourbon requires 51% or more corn in the mash bill (list of grains in the mash), and is aged in new containers made of oak and charred on the inside. Once the barrel is used, it can’t be reused for bourbon, and so there are a lot of them going cheap. This is why you can’t throw a bottlecap without hitting a local microbrewery producing bourbon-barrel-aged something-or-other. Legally, whiskey imported from outside the USA can’t be sold as “Bourbon”. Scotch mash starts with malted barley, but then breaks down further in to single malts, blended malts, and a bunch of other categories, depending on how the mash was set up and whether it was mixed after aging. You don’t have to char the barrels, and you can reuse them. In fact, some scotches are aged in barrels that previously held other spirits, like sherry or port. As a result of all this messing around, the resulting spirits taste wildly different. Maker’s Mark is has a sweet, honey/floral and vanilla thing going on, while Laphroaig tastes like you’re watching an evening storm roll in on a cold, seaweed-strewn ocean beach while someone sets tires on fire to keep warm. Even within scotches, there’s a ton of variation, so if a spec calls for Cragganmore vs. something like Lagavulin, they’re aiming for a particular flavor profile.
My instinct at this point is to split or lump scotches by region, other whiskeys by bourbon/rye split, and tequilas by age (blanco/reposado/anjeo). Gins are even weirder, so I’m not sure what to do there, aside from keeping flavored ones (usually orange or lime, although Hendricks’ cucumber also counts) distinct from “plain” ones. Rums are probably the weirdest of all, despite all being distilled sugarcane spirit. There are different styles of still, different aging processes, and the large number of fairly weird flavored things out there, like Malibu (which I might just delete on sight for being vile). Vodka is boring, it’s largely required to be flavorless, although there are texture and smoothness aspects to it. Where it’s going to be a problem is, again, flavored nonsense.
In terms of tackling the project, the first thing I’m going to do is get the various files into a standard format for representing ingredients and their amounts. At that point, I can also standardize the amount units, probably to ounces since that’s what my jiggers are calibrated it.
Once everything is standardized in representation, I can also unify the datasets and create a derived dataset with the amounts converted to proportions, but this raises another problem: cocktail names. I feel like that is actually two problems: variations of a cocktail, and people reusing names for different drinks. There is a cocktail in the Savoy Cocktail book that is called an Aviation, and has most of the ingredients that are also in the Death and Company Aviation, but the amounts are different, and the D&C Aviation is a far superior drink. I fully expect, however, that there are different drinks with the same or very similar names in the dataset. At the moment, I’m thinking that the solution here is not to treat names as special, and especially not to use them as keys in any kind of data structure.
What I Want to Do with the Data
One thing that a statistical model of cocktail ingredients is good for is reverse-engineering the ingredient amounts from cocktails at a bar. Bars will typically list the ingredients, but not the amounts, and so if you want to replicate the drink at home, you’re out of luck. Given a list of ingredients, it should be possible to predict the likely proportions of them in a cocktail. As mentioned previously, sour/acid components frequently have a balancing sweet ingredient and so on. I suspect that the solution to this is some kind of Bayesian constraint solver, where it has to pick out the amounts that are most likely, conditioned on the other amounts and ingredients, and with the constraint that the total volume doesn’t go over what you can fit in a cocktail glass. If you want to drop the constraint, just work with proportions and then put in 1oz of base spirit and calculate the others from there.
Another possible application is finding the drinks that don’t exist, but should, or the drinks that really really shouldn’t exist. A drink that doesn’t exist but should is one whose hypervector representation puts it in a void in a cluster. I think the representation of the problem there is that it should maximize dissimilarity with everything in the cluster, while not being so unlike them that it ends up in another cluster. Drinks that shouldn’t exist are ones that maximize distance from any cluster, subject to the constraint that the total volume is still reasonable. The volume constraint is because one gallon of lime juice with a pint of bitters is one way to end up very far from all other drinks along the “lime juice” dimension. Another constraint is that a drink probably shouldn’t have more than 5-8 ingredients in it, although the exact number is something I will probably derive from the data.
An even sillier proposition is to train a neural network to produce names from recipes, or vice versa. This is silly because it creates a latent space that continuously maps sequences of English letters to plausible combinations of cocktail ingredients, and so can also create a cocktail for “England” or “Beatrix Potter”. It would be interesting to see if a “Southcar” or a “Sideside” were similar to each other or to the Southside and Sidecar. Going the other way is also amusing, because it can name cocktails based on ingredients. I’m slightly curious how the name of a Margarita changes as you add increasing amounts of blackberry liquor to it.
Chatbots as Translation
I got a translation program based on deep neural networks (http://opennmt.net/ if anyone wants to play along at home). I’m training it on a corpus of my previous text chats. The “source language” is everything that everyone has said to me, and the “target language” is what I said in return. The resulting model should end up translating from things that someone says to appropriate responses. My endgame is to hook it up to an instant messaging client, so people can chat with a bot that poses as me.
This has a couple of problems. The main one is that statistical translation generally works by converting the input language into some abstract form that represents a meaning (I’m handwaving so hard here that I’m about to fly away) and then re-representing that meaning in the output language. Essentially, the overall concept is that there is some mathematical function that converts a string in the input language to a string in the output language while preserving meaning, and that function is what is learned by the neural network. Since what someone says and what I respond with have different, but related meanings, this isn’t really a translation problem.
The other problem comes up when I do the second part of this plan, which is to train the network with a question and answer corpus. At its most abstract, a question is a request to provide knowledge which is absent, and an answer is an expression of the desired knowledge. “Knowledge”, in this case, refers to factual data. One could attempt an argument that by training on a Q&A corpus, the network is encoding the knowledge within itself, as the weights used in the transformation function. As a result, the network “knows” at least the things that it has been trained on. This “knowing” is very different from the subjective experience of knowing that humans have, but given the possibility that consciousness and subjective experience may very well be an epiphenomenon, maybe it has some similarities.
Unfortunately, this starts to fall apart when the deep network attempts to generalize. Generalization, in this case, is producing outputs in response to inputs that are not part of the training input data set. If one trains a neural network for a simple temperature control task, where the input is a temperature, and the output is how far to open a coolant valve, the training data might look like this:
Temperature | Valve Position |
---|---|
0 | 0 (totally closed) |
10 | 0.1 |
20 | 0.2 |
30 | 0.3 |
40 | 0.4 |
50 | 0.5 (half open) |
60 | 0.6 |
70 | 0.7 |
80 | 0.8 |
90 | 0.9 |
100 | 1.0 (fully open) |
So far, so good. This is a pretty simple function to learn, the valve position is 0.01 * Temperature. The generalization comes in when the system is presented with a temperature that isn’t in the training data set, like 43.67 degrees, which one would hope results in a valve setting of 0.4367 or thereabouts. There is a problem that temperatures less than zero or greater than 100 degrees result in asking the valve to be more than completely shut, or more than fully open, but we can just assume that the valve has end stops and just doesn’t do that, rather than trying to automatically add a second valve and open that too.
The problem comes when we start generalizing across questions and answers. Assume there is some question in the training data that asks “My husband did this or that awful thing, should I leave him?” and the answer is mostly along the lines of “Yes, bail on that loser!”, and another question that asks “My husband did some annoying but not really awful thing, should I leave him?” and the answer is “No, concentrate on the good in your relationship and talk to him to work through it.” These are reasonable things to ask, and reasonable responses. Now imagine that there is a new question. The deep network does its input mapping to the space of questions, and the result (handwaved down to a single value for explanation purposes) falls somewhere between the representations for the “awful thing” question and the “annoying thing” question. Clearly, the result should fall somewhere between “DTMFA” and “Stick together”, but “Hang out near him” isn’t really good advice and “Split custody of the kids and pets, but keep living together” seems like bizzaro-world nonsense. There isn’t really a mathematical mapping for the midrange here. Humans have knowledge about how human relationships work, and models of how people act in them, that we use to reason about relationships and offer advice. This kind of knowing is not something deep networks do (and it’s not even something that anyone is trying to claim that they do), so I expect that there will be a lot of hilarious failures in this range.
Ultimately, this is what I’m hoping for. I’m doing this for the entertainment value of having something that offers answers to questions, but doesn’t really have any idea what you’re asking for or about, and so comes up with sequences of words that seem statistically related to it. We (humans) ascribe meaning to words. The deep network doesn’t. It performs math on representations of sequences of bytes. That the sequences have meaning to us doesn’t even enter into the calculations. As a result, its output has flaws that our knowledge allows us to perceive and reflect on.
Plus, I’m planning to get my Q&A corpus from Yahoo Answers, so not only will the results be indicative of a lack of knowing (in the human sense), they’ll also be amazingly low quality and poorly spelled.
Bad ideas for NaNoWriMo
I’m not a novelist. I can write, and with sufficient editing, I can even write passable English text, but it’s not really something that I’ve put a lot of effort into, and so not something that I’m good at. My real time investment in writing has been writing software, which is read only by compilers and interpreters, and so what it lacks in excitement, narrative, and plausible dialog, it makes up for in conciseness and precision.
My ideas for National Novel Writing Month (November, for those as don’t know) then, are coding a program that writes novels, and coding a program that takes novels as inputs and generates interactive fiction (text adventure games) from them. Both of these are problems with infinite hair, and are arguably AI-Hard problems (that is, problems whose solution is on par, difficulty-wise, with creating a human-level general-purpose AI). On the other hand, writing a good novel is probably also quite hard.
I think the most reasonable approach for the novel generator would be a recursively-defined novel description language, which selects from tropes and plot stubs, generates characters, and so forth, based on relatively simple rules. The complexity would come from applying the rules over and over, so a simple quest to throw a ring into a volcano grows branches on branches on branches until it is One Damn Thing After Another Until All The Orcs Are Dead. The goal of the program would be to use generative content and emergent behavior to do most of the writing, and leave me to fill out the turns of phrase and details (or generate them a la Dwarf Fortress, which menaces with spikes of ivory). Done badly, this would read like a Mad Lib. Done well, it would read like a Mad Lib filled out by people who don’t say “dongs” every time they are asked for a noun.
Making interactive fiction (IF) out of novels would be substantially harder. The novel parser would have to read English, which is actually quite a trick. English has multiple words for one meaning and multiple meanings for one word, highly flexible structure, and counts on the reader to sort it all out. On top of that, most of the awesome tricks one can pull in English are more a matter of exploiting shared cultural context with your reader than they are particular sequences of words. If I wrote such a parser in one month, or at all, a lot of linguistics researchers would be out of work.
Assume I went for one tiny part of the problem: identifying the locations in the novel. The same place might be described as “where that party was”, “Joe’s house”, “the darkened house”, and “a pit of iniquity”. Only the events of the novel link them, and so the program would have to determine that these totally different words referred to the same place. The best approximation I could likely come up with is identifying all the things in the novel that sound like places, and then performing some sort of clustering based on what words are mentioned close to mentions of those places. This would likely lead to a bunch of spurious places getting generated, and real places getting overlooked. There is an entire company, called Metacarta, that did this sort of analysis on much more constrained data sets, and even then it was a difficult problem for a team of people who were likely smarter than me.
However, doing a good job of adapting novels to interactive fiction might not be the best approach. It might be better to get a rough cut of the software together to do anything at all, and gradually improve it until it writes things that are playable curios, rather than detailed simulations of well-loved novels. It wouldn’t be a matter of playing through “A Game of Thrones” so much as it would be “poking around in a demented dreamscape based loosely on ‘A Game of Thrones'”.
This is actually related to another idea that I had, which is sort of a rails shooter based on the consequences of shooting things. You play through the game, riding the rails and shooting enemies of varying levels of craftiness and menace. When you reach the end of the level, you just loop through it again, passing the bodies of everything you killed, and getting another shot at everything you missed. This repeats until you have killed everything in the game world, whereupon you continue to loop, passing through scenes of slaughter as the heroic music fades and is replaced with silence and the buzzing of flies. Perhaps, if you let it run long enough, the dead bodies would rot to skeletons. Now, of course, I’ve spoiled it for you, but I’ll probably never get around to writing it, so at least you’ve had the idea.
Chatbot is done
I finished writing the chatbot that I was working on. It consists of a set of scripts to prepare the data, and another script that listens for incoming messages and responds. You can get the code and an overview of how it works here.
Obviously, I’m not publishing my chat logs. Use your own. It is designed to work with Pidgin’s HTML-like format for chat logs, but it could be modified to work on almost any corpus. I really should clean up things like the string cleaning routines, but it worked for class, and that’s what actually matters.
OpenCv and finding rectangles
I have been working, on and off, on a computer vision application that will recognize a card from a certain game in an image, figure out what the card is, and add it to a database. I had some luck in earlier versions of the code looking for template matches to try to find distinctive card elements, but that fails if the card is scaled or skewed, and it rapidly becomes too processor-heavy if there are many templates to match. Recently, at work, I have had even more opportunity to play with OpenCV (a computer vision library), and have found a few blogs and tricks that might help me out.
The first blog shows how to pick out a Sudoku puzzle from a picture. The most important part is finding the corners of the puzzle, as after that, it can be mapped to a square with a perspective transform. I can do a similar trick, only I’ll be mapping to a rectangle. Since corner-finding is kind of scale-invariant (the corner of something is a corner at any scale), this will let me track a card pretty easily.
I think that I can actually use OpenCV’s contour finding to get most of the edges of the card, and then the Hough transform to get the corner points. I may even be able to get away with using just contour finding, getting the bounding rectangle of each contour, and checking that it has something like the proper aspect ratio. This will work in the presence of cards that are rotated, but fails on perspective-related skewing.
This StackOverflow post has a nice approach to getting the corners of a rectangle that has some rotation and perspective skew.
Once I have the card located, I’m going to throw a cascade of classifiers at it and try something like AdaBoost to get a good idea of which card it is. Some of the classifiers are simple, things like determining the color of the front of the card. Others may actually pull in a bit of OCR or template-based image recognition on (tiny) subsections of the card. Since I will actually know the card border at this point, I can scale the templates to match the card, and get solid matches fast.
More on recognizing Magic cards
Computer vision is hard.
I have code written that detects edges in a video image, picks pairs of edges that are both within a certain ratio of lengths relative to each other and within a specific angle of each other. I’ll post the code up once I have pushing to my github account set up correctly. It’s actually pretty poor at locating cards in a video image, but it is a demo of how to use some of the UI and feature recognition stuff from OpenCV.
From here, I have a few ideas about how to proceed. The first is to have the program display a rectangle and ask the user to align the card to that rectangle. This means the program will not have to find the card in the image, and can focus on recognizing parts of the image. This will use template-based recognition to pick out mana symbols, and should be pretty simple. The classifier that tries to pick out colors will be even easier, as it will just select a sample region, blur it, and match it to precomputed colors. This can be calibrated for specific lighting conditions for each run.
An even less hands-on approach was suggested by my co-worker Eric. The cards could be displayed by a machine that essentially deals one card at a time. The machine would have a magazine of cards, and would photograph a card, deal it into a hopper, photograph the next card, and so forth, until it was empty. This would have the problem that it wouldn’t be something anyone could download and use, as it would require a special card-handling machine, but the software end would be compatible with the user-based registration approach described above, so someone with a lot of patience could use the manual method.
Another approach would be to throw all the cards on a flatbed scanner (possibly with a funny-colored (e.g. red) bed lid, do template matching to locate card borders, and then segment the images based on that. An ADF scanner with a business card attachment could probably also make short work of a modestly-sized set of cards.
Magic Card Recognizer
I used to play the collectible card game (CCG) Magic:The Gathering. Like most CCGs, Magic has a large set of different cards that players can use to build a set for playing games. This is both fun, as it means new cards will allow new play types and strategies, and annoying because of the artificial rarity of some of the cards. I don’t have a lot of people to play with, so I am planning to sell my cards.
I will probably make more money selling the cards as individual cards (“singles”) than I would get by selling the whole set. However, that means that I need to know how many of each card I have. Given that I probably have upwards of 8,000 cards, I don’t want to sit down and type in the name of each card. It would be better if I could have a computer program do it for me, so I’m working on writing one. The rest of this article uses jargon from Magic and computer vision, so it may be a little incomprehensible to people who are not Magic-playing computer vision nerds.
The program will take an image using a web cam and look for two straight edges, probably using some form of edge detection or a Hough transform. Once it has the edges, it will look for two edges whose ratio of lengths is the same as a Magic card. The edges must share an endpoint, so that the system can tell they are the edges of the same object. The area inside the rectangle that has those lines as its edges is the card.
Once the card is detected, the simplest thing to do is to match the card image against card images stored in a massive database of all card images. Unfortunately, there are over 11,000 unique cards (11,458 as of Feb 2009), which would make for a processor-intensive comparison process.
My plan to circumvent this is to have the program get the casting cost of the card by using processing techniques similar to face detection. The most useful technology to detect mana symbols is probably feature-based template matching. Feature-based template matching allows the computer to pick out a region of a picture that matches another, smaller image, even in the presence of some distortion. Mana symbols haven’t changed significantly since the development of the game, so they should be easy to pick out.
I can also get the color of the card by selecting a region of border, blurring it to remove any texture, and comparing the color to a set of swatches. I’ve done this sort of comparison before, by measuring distance in RGB color space, and it can be done quickly and effectively. The main possible pitfall is poor lighting shifting the color of the object, but I can at least arrive at a probabilistic result based on the RGB distance. Combining the estimated color of the card and the casting cost will allow me to significantly reduce the set of possible pictures that the card image needs to be matched against.
There is also the question of building the database of card images, but I believe I can do that by screen-scraping the web site of the company that makes Magic Cards. I won’t be able to distribute the database with my program, as it will contain copies of copyrighted data, but I can distribute the screen-scraping script.
I may also be able to recognize features like card text, but that will rely on very good lighting and very good cameras. I would prefer that this program work with a moderately high-quality webcam, so that it will be useful to people other than me.
The recognizer will try to build a list of cards that it thinks matches, ordered by the confidence of the match. If the user rejects the first card, it will present the next, until it runs out of guesses. If the use accepts the guess, the recognizer will add that card to a database of all the cards the user owns. In this manner, the user can build a database of cards by simply showing the cards to a computer.
Recent Comments