Category: Scripting

Download ALL The Music

Given a file containing a list of songs, one per line, in the format “Artist – Song Title”, download the audio of the first youtube video link on a Google search for that song. This is quite useful if you want to the MP3 for every song you ever gave a thumbs up on Pandora. On my computer, this averages about 4 songs a minute.

The Requests API and BeautifulSoup make writing screenscrapers and automating the web really clean and easy.

#!/usr/bin/python

# Takes a list of titles of songs, in the format "artist - song" and searches for each
# song on google. The first youtube link is passed off to youtube-dl to download it and 
# get the MP3 out. This doesn't have any throttling because (in theory) the conversion step
# takes enough time to provide throttling. 

import requests
import re
from BeautifulSoup import BeautifulSoup
from subprocess import call

def queryConverter(videoURL):
	call(["youtube-dl", "--extract-audio",  "--audio-format", "mp3", videoURL])

def queryGoogle(songTitle):
	reqPreamble = "https://www.google.nl/search"
	reqData = {'q':songTitle}
	r = requests.get(reqPreamble, params=reqData)
	if r.status_code != 200:
		print "Failed to issue request to {0}".format(r.url)
	else:
		bs = BeautifulSoup(r.text)
		tubelinks = bs.findAll("a", attrs={'href':re.compile("watch")})
		if len(tubelinks) > 0:
			vidUrl = re.search("https[^&]*", tubelinks[0]['href'])
			vidUrl = requests.utils.unquote(vidUrl.group(0))
			return vidUrl
		else:
			print "No video for {0}".format(songTitle)

if __name__=="__main__":
	with open("./all_pandora_likes", 'r') as inFile:
		for line in inFile:
			videoURL = queryGoogle(line)
			if videoURL is not None:
				queryConverter(videoURL)

PDB for n00bs

PDB is the python debugger, which is very handy for debugging scripts. I use it two ways.

If I’m having a problem with the script, I’ll put in the line

import pdb; pdb.set_trace()

just before where the problem occurs. Once the pdb line is hit, I get the interactive debugger and can start stepping through the program and seeing where it blows up, and what variables are getting set to before that happens.

However, I recently found a very handy second way. I was debugging a script with a curses interface, which cleans up when it exits. Unfortunately, that cleanup means that my terminal gets wiped when something crashes, so instead of a stack trace, I just get dumped back to the terminal when something goes wrong, with no information at all left on the screen.

Invoking the script with

python -m pdb ./my_script.py

gets me the postmortem debugger, so when something goes wrong, the program halts and I get the interactive debugger and some amount of stack trace. It’s messy looking because of curses, but I can at least see what is going on.

Playatech started charging for their plans

Unfortunately for burners, you can no longer download Playatech’s plans for their furniture without paying them first. They used to offer the plans as free downloads, and then asked that you donate some small amount if you used them.

Unfortunately for Playatech, they left all the PDFs in a world-readable directory. The command line below gets the index of that directory, finds all the lines with “pdf” in them, gets the file names out using cut, and then downloads each file.

for file in `wget -qO- http://playatech.com/wp-content/uploads/2013/05/ | grep pdf | cut -d ‘>’ -f 2 | cut -d ‘”‘ -f 2`; do wget http://playatech.com/wp-content/uploads/2013/05/$file; done

Flickr Downloadr that really works

Not my work. Get it here.

It does exactly what it says on the tin. This is letting me close a years-old open loop I had, which is that Flickr had a lot of my photos, but sucked so bad that I didn’t want to reward them with money in order to get my photos back.

As soon as the download is done, that Flickr account is toast.

Well THAT'S messy

for file in ../connections_2014-10-7-1*; do conn="-c ../connections_"`echo $file | cut -d "_" -f 2`; types="-t ../neuron_types_"`echo $file | cut -d "_" -f 2`; locs="-l ../locations_"`echo $file | cut -d "_" -f 2`; ./pickle_to_json.py $conn $types $locs; done

For all the connection files that were generated today, create three variables called “conn”, “types”, and “locs” that have a command line switch path in them generated from a fixed prefix and a cut from the connection file name. Then invoke the script “pickle_to_json.py” with those variables as arguments.

Effectively, the connection, neuron type, and location files are all related by their date, so this makes a single JSON file out of the multiple files. I just didn’t want to run pickle_to_json.py a bunch of times by hand, as that seemed error-prone.

Splitting a CSV file into a bunch of columns

awk -F, '{for(i=1;i<=NF;i++){print $i > "sample"i".csv"}}' yourfile.csv

Does what is says on the tin. Splits your CSV file into a bunch of files, one for each column of the original files. Found here.

I’m using this to pull single channels out of a 60 channel file full of recorded neuron voltages, which I’m then throwing through a little filter test program that I whipped up using this filter library. My main goal is getting rid of 60Hz line noise, but the fluorescent bulbs in the room apparently also make noise at 180Hz and 300Hz.

Useful mencoder invocation

mencoder -nosound mf://*.jpg -mf w=1280:h=800:type=jpg:fps=30 -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=2400:mbd=2:keyint=132:v4mv:vqmin=3:lumi_mask=0.07:dark_mask=0.2:mpeg_quant:scplx_mask=0.1:tcplx_mask=0.1:naq -o output_filename.avi

Turns all the JPEG files in the directory you are currently in into a nice quality MPEG-4/AVI file. The width and height in the options after -mf should be changed to match the images. This command line also works for PNG files if you replace both instances of “jpg” with “png”.

Naming things and "ImportError: No module named msg"

I’m using ROS at school for a project. Part of the project is to detect someone’s hand with a camera, so I’m just looking for a patch of “skin colored*” pixels. ROS organizes software as packages, with nodes in them, and messages that the nodes use to communicate with each other.

For my system, I had a package called “hand_detector” with a source file called “hand_detector.py” and a message type called “hand”. ROS generates the messages, which I then import into my python code with the line:

from hand_detector.msg import *

This gets me the error message: “ImportError: No module named msg”

The reason for this is that python searches the same directory as the executing script for imports before it goes looking anywhere else. Since the file hand_detector.py is the executing script, and is naturally in the directory with itself, python finds it there, imports it into itself, and then tries to find a module called “msg” within hand_detector.py. There’s no .msg in there, so I get the error.

The moral of the story here is don’t name your package and the script in it the same thing. Once I converted the script to just “detector.py”, the problem went away.

*I’m somewhat concerned that I wrote a “white people detector”, as it’s really just thresholding the H part of the HSV color space and counting pixels. Other color spaces may be better for this, but this doesn’t have to be perfect. I just don’t want the robot to be a dick to black people.

Command Line Audio Editing With Sox

For my ritual spoken word software piece, I recruited a bunch of my friends to say the text of the ritual. Each “stanza” of the ritual has a call and a response, so I broke each recording up into individual clips for each call and response. That gave me about 28 files per person, and over 100 clips total.

The different participants all recorded on different hardware, and at different volume levels. I also wasn’t super-precise about trimming the clips, so each file had silence at the beginning.

This left me with two problems: some participants were much softer than others, and some of the clips lagged each other, which made for bad chorus effects.

To trim the clips, I used sox, a Linux tool for manipulating sounds, with the command:

for file in *.wav; do sox $file $file.wav silence 1 0.1 2%

This results in a file named foo.wav.wav for each foo.wav file in the directory, so I cleaned up with:

rename -f “s/.wav././g” *

Note that this scribbles over the originals, so keep backups. I’m glad I did, because 2% turned out to be a little aggressive, and trimmed off the beginning of clips starting with an “ma-” sound, such as “make us a…”. This is likely because the sound faded in slowly, and so got counted as part of the noise rather than the beginning of a sound.

There is useful documentation for the sox silence filter here.

Turning the volume up on the files was done with:

for file in *.wav; do sox $file $file.wav gain -l 8; done

and another pass of rename, as above. Adjust the “8” up or down to suit your needs. Positive numbers make it louder, negative make it quieter.

If you want to preview a sox effect, just replace “sox” in the command with “play”, and leave off the output file. For example,

play myfile.wav gain -l 8

will play myfile.wav with increased gain, but won’t change the file.

We Make Ritual Noise

For a festival that I attend, I’m writing a soundscape in boodler to provide the vocal component for a ritual. Here, I’m going to annotate what I need to do to run Boodler on my laptop, which I’ll have at the festival.

The main thing is that Boodler seems to default to OSS, and I use PulseAudio, so to invoke the ritual, you need to run:

boodler -o pulse –external disturbingrelics com.gizmosmith.disturbingrelics/Example

The -o option tells it to use PulseAudio, –external makes it load from a directory instead of a .boop package for testing purposes, and the rest is the agent to run.

To organize all the sound clips I’m using, I have a boodler package for each person’s reading of the ritual script. The script is in a call and response format, with 14 calls and responses, so each package has 28 audio clips, one each for the call and response. I named all the clips “call_N_…” and “response_N_…” (for N in 1..14) so that the program can figure out the call/response pairs by name.

Each package starts out as a directory with the 28 files and a metadata file in them. For the directory “sage”, I create the package with:

boodle-mgr –import create sage

and then install it with:

boodle-mgr install ./com.gizmosmith.sage.1.0.boop