Month: October 2012
I am building a drink-dispensing robot. It has 5 pumps internally, so I want to find the set of five liquids that will produce the largest variety of mixed drinks. To do this, I’m going to need a huge set of drink recipes. I got a bunch from a cocktail database that esquire maintains, but the biggest list I’m aware of is The Webtender. Unfortunately, that database isn’t in a form that permits me to make queries to find out what is the maximal set of drinks that I can make, given the constraint of 5 liquid ingredients. So instead of the web interface, I want the raw data.
This means I want to download every single drink recipe from The Webtender. The URLs there are of the form http://www.webtender.com/db/drink/6217, so the obvious thing to do would be to write up a little bash oneliner that just wgets each of the files in turn. It would probably look something like this:
for (( i=1; i <= 6217; i++ )); do wget http://www.webtender.com/db/drink/$i; done
But it may be that The Webtender issues a 403 Forbidden error if you use wget, probably to prevent just this sort of hijinks. Unfortunately for them, wget can be configured to claim to be something else. These instructions provide the config file to cause wget to claim to be Mozilla on Windows NT, which The Webtender should have no problem with.
For my next trick, I'll use BeautifulSoup to turn the HTML files from the webtender and Esquire into a SQL database, and probably perform some form of normalization on the data, such as making all the measurements be in the same units.