Drinking at Cloud Scale

I may have at least a partial solution for representing cocktails that solves some of the problems that I was complaining about in my previous post. I actually got into it as part of a work project, but the project is suspended and I want to do something with the technique, so y’all are hearing about it here. This post uses MathJax to render LaTeX math notation inline, because I’m a huge nerd it makes the explanation easier.

To recap, a cocktail can sort-of be thought of as a point in a hyperspace, where each dimension is an ingredient, and how much of each ingredient is in the drink is the distance along that dimension. Since we live in a low-dimensional space (3D), let’s consider the Margarita. Classically, that’s a 3:2:1 ratio of tequila:triple sec:lime juice (fresh, this drink is showcasing the lime as much as the liquor). Move three units along the tequila dimension, 2 units along the triple sec dimension, 1 unit along the lime dimension. A Daiquiri also has three ingredients, but the ratios are a little different: 2 units rumward, 1 unit towards lime, and 3/4 unit along simple syrup.

Between those two drinks, though, there are 5 ingredients, so the space they are in is at least 5-D. They are, however, kind of close in it, because they’re both 1 lime unit out from the origin. There are a lot of good reasons to not use Euclidean distance in high-D spaces, and annoyingly, “high” is not what most people who do this sort of thing think of as high. Aggarwal, Hinneburg and Keim (2001) use 20 for their empirical tests. Twenty? Gotta pump those numbers up! Those are rookie numbers in this racket.

However, there are two problems.

Problem the first: the variation within a spirit is complex. It may be able to be represented as a hierarchy, but determining where the levels split is complicated. This might actually be a job for the Gini index, which is a measure of class purity and is typically used for deciding where to put the splits in decision trees. Mostly, I don’t want to sit down and hand write a booze ontology myself.

Problem the second: There are a lot of things that aren’t measured in the same units. Alcohol and mixers are measured in more-or-less continuous graduations of volume. Egg whites are measured in integers, and it’s a pretty rare drink that has more than one (yes, yes, eggnog). Mint leaves, cinnamon sticks, orange slices, and other garnishes and muddled ingredients are measured in integer counts or units like “sprigs”. As a result, the dimensions don’t use comparable units. I could convert an egg white to a volume, but the volume that muddling 5 mint leaves and then straining them out adds to a drink is trivial. Nonetheless, the mint is important.

One possible solution to this is to represent the drinks using a Vector Semantic Architecture (VSA). VSAs use vectors of a fixed length to represent semantic atoms, and can compose those vectors using a number of operations to represent data and perform logical operations.

One weird thing to note here is that this is a holographic, non-localized way of representing ingredients, while the typical each-ingredient-is-a-dimension representation is a localized representation, it uses one specific location in the vector per ingredient. The VSA way of representing an ingredient has an entire N-element vector, where N is typically in the thousands, and something that has multiple ingredients is the combination of all of those vectors. In general, there are two ways to combine vectors in a VSA, bundling, which is sort-of like addition, and binding, which is sort of like multiplication. As a result, the VSA is effectively an algebra over an embedding.

Because the vectors are a non-localized representation, checking the similarity is a little more complicated. Typically, cosine similarity is used for floating point vectors, and hamming distance is used for bipolar vectors. High-dimensional vectors have the property that any two randomly chosen ones are going to have very very low cosine similarity, so you can represent all your semantic atoms by randomly generating vectors. More interestingly, you can intentionally create vectors with varying degrees of similarity to each other (Rachkovskij, Slipchenko, Kussl, Baidyk 2005). That way, the light rum and dark rum vectors can be designed to be more similar to each other than to vodka.

Even more interestingly, there are ways to use FHRR fractional binding to represent the prevalence of a thing in a multiset, and so affect how similar the multiset vector is to the vector for that thing. Combining that with the compositional nature of the VSA operations, it should be possible to use VSA vectors to represent the liquor ontology. For example, you can have rum. You can have a dark rum. You can have a dark Jamaican pot still rum, and so forth. Each liquor’s representation could be a bundling of a bag of words describing that liquors flavor, and bundling that with the base type, possibly augmented by a high prevalence of the base type through fractional binding, to ensure that all the rums are most similar to the base rum, but also that rums that are of similar flavor descriptions would have similar vector representations.

This brings us to point that makes VSAs spectacular for building neuro-symbolic AIs. The vectors can be learned from data instead of hand-engineered. You might expect that if I want this rum ontology, I’d have to sit down and come up with all of the vectors myself, like some computationalist peasant. Instead, I suspect that a lot of the the flavor descriptions could automatically be pulled from marketing materials on liquor sales websites and fed through an LLM or some more conventional NLP pipeline to produce the features that form the description of each liquor. Simply building these liquor vector representations is useful, because the inter-vector similarity points the way to substitutions. Strega is sweeter than yellow Chartruse, but it’s close enough (and cheaper). Admittedly, there’s some appeal to studying all the boozes in order to handcraft a fine, artisanal ontology, but I’m a mortal man, and humanity has been distilling since at least the 8th century AD. The boozes have a head start, and if this project has taught me anything at all, it’s how to be decent at pillaging web sites with Python scripts.

For representation of cocktails, the ingredients would be bound with amounts and bundled to produce the final cocktail representation. This is where the VSA comes in handy for representing amounts. Because of the compositional nature of the representation, “1 dash of Angostura” becomes a vector of the same length as “2 oz of gin” or, even more flexibly, “garnish with an orange”, and each vector contributes to the recipe’s similarity to other vectors that also have that ingredient (or even operations, in the case of garnishing). I think, with a careful design of the representation, it would be possible to create a vector of ingredients with no amounts, pass it to a vector DB and pull the closest match, and then get the amounts of those ingredients back out of it via unbinding, but I still have a lot of work to do to ensure that the representation is capable of answering the questions that I would have about a cocktail.