Category: Scripting

Color All The FloatCanvas Objects!

I’m drawing stuff over a background image of the ocean, which looks bluish-green. Naturally, bluish-greens and some grays don’t have enough contrast to stand out, so they kind of get lost in the image.

I initially tried calculating luminance from RGB triplets, which does work for readability, but I was using the web readability thresholds for the luminance ratio (4.5 for normal, 7 for high contrast), and it didn’t work very well because my background is sort of middling in luminance, so it punched out most of the middle range of colors, resulting in everything being either dark and similar-looking, or light and similar-looking.

I switched to RGB color distance (which, yes, isn’t perceptually flat, but this isn’t really a color-sensitive application, beyond not making them too similar). In order to figure out where my threshold should be, I wanted to get a list of all the wxPython named colors, and what they looked like, and their distance from “cadetblue”, which is about the same color as the average of my ocean background.

#!/usr/bin/python

# Generate a page that previews all the colors from WX
import wx

app = wx.App(False)

import wx.lib.colourdb as wxColorDB

def get_d(colorname1, colorname2="cadetblue"):
    target_color = wx.Colour(colorname1).Get()
    source_color = wx.Colour(colorname2).Get()
    # Euclidian distance
    d = math.sqrt(sum([pow(x[0] - x[1], 2) for x in zip(target_color[:3], source_color)]))
    return d
    
print("<html>")
print("  <body>")

colors = list(set(wxColorDB.getColourList()))
colors.sort(key=lambda c: get_d(c, "white"))

for color in colors:
    cRGB = wx.Colour(color).Get()
    print("    <div style=\"white-space: nowrap\">")
    print(f"      <div style=\"display: inline-block; width:300px\">{color}</div>")
    print(f"      <div style=\"display: inline-block;height:30px;overflow:auto;background-color:rgb({cRGB[0]},{cRGB[1]},{cRGB[2]}); width:100px\"> </div>")
    d = get_d(color, "white")
    print(f"      <div style=\"display: inline-block\">{d}</div>")
    print("    </div>")

print("  </body>")
print("</html>")

That script generates an HTML page with the colors in it, ranked by distance from the given color name. I picked white for the version published here, but as you can see, the default is “cadetblue”. If you pick a color name WX doesn’t know, you’re going to have a bad time.

A distance of 80 seemed to work pretty well for me, so as a rule of thumb, 80 units of color distance gets you a distinct color in 8-bit-per-color RGB color space.

There are, of course, some problems to be aware of. For instance, distance folds hue and value together, so getting brighter or darker and remaining the same hue can make up a lot of that 80 units, without necessarily getting good contrast.

TF-IDF in Python

I am trying to process a bunch of text generated by human users to figure out what they are talking about, in the context of an experiment where there were robots a person could be controlling, a few target areas the robots could move to, and a few things they could move, like a crate. One thing that occurred to me was to use TF-IDF, which, given a text and a collection of texts, tells you what words in the text are relatively unusual to that particular text, compared to the rest of the collection.

It turns out that that’s not really what I wanted, because the words that are “unusual” are not really the ones that the particular text is about.

select(1.836) all(3.628) red(2.529) robots(1.517) ((1.444) small(5.014) drag(2.375) near(3.915) lhs(4.321) of(1.276) screen(2.018) )(1.444)

This is a sentence from the collection, and the value after each word is the TF-IDF score for that word. The things I care about are that it’s about robots, and maybe a little that it’s about the left hand side of the screen. “Robots” actually got a pretty low score (barely more than “of”), but “small” got the highest score in the sentence.

At any rate, this is how I did my TF-IDF calculation. Add documents to the object with add_text, get scores with get_tfidf. It uses NLTK for tokenization, you could also use t.split(” “) to break strings up on spaces.

class TFIDF(object):
  def __init__(self):
    #Count of docs containing a word
    self.doc_counts = {}
    self.docs = 0.0

  def add_text(self, t):
    #We're adding a new doc
    self.docs += 1.0
    #Get all the unique words in this text
    uniques = list(set(nltk.word_tokenize(t)))
    for u in uniques:
      if u in self.doc_counts.keys():
        self.doc_counts[u] += 1
      else:
        self.doc_counts[u] = 1

  def get_tfidif(self, t):
    word_counts = {}
    #Count occurances of each word in this text
    words = nltk.word_tokenize(t)
    for w in words:
      if w in word_counts.keys():
        word_counts[w] += 1
      else:
        word_counts[w] = 1
    #Calculate the TF-IDF for each word
    tfidfs = []
    for w in words:
      #Word count is either 0 (It's in no docs), or the count
      w_docs = 0
      if w in self.doc_counts.keys():
        w_docs = self.doc_counts[w]

      #the 1 is to avoid div/zero for previously unseen words
      idf = math.log(self.docs/(1+w_docs))
      tf = word_counts[w]
      tfidfs.append((w, tf * idf))
    return tfidfs

 

Download ALL The Music

Given a file containing a list of songs, one per line, in the format “Artist – Song Title”, download the audio of the first youtube video link on a Google search for that song. This is quite useful if you want to the MP3 for every song you ever gave a thumbs up on Pandora. On my computer, this averages about 4 songs a minute.

The Requests API and BeautifulSoup make writing screenscrapers and automating the web really clean and easy.

#!/usr/bin/python

# Takes a list of titles of songs, in the format "artist - song" and searches for each
# song on google. The first youtube link is passed off to youtube-dl to download it and 
# get the MP3 out. This doesn't have any throttling because (in theory) the conversion step
# takes enough time to provide throttling. 

import requests
import re
from BeautifulSoup import BeautifulSoup
from subprocess import call

def queryConverter(videoURL):
	call(["youtube-dl", "--extract-audio",  "--audio-format", "mp3", videoURL])

def queryGoogle(songTitle):
	reqPreamble = "https://www.google.nl/search"
	reqData = {'q':songTitle}
	r = requests.get(reqPreamble, params=reqData)
	if r.status_code != 200:
		print "Failed to issue request to {0}".format(r.url)
	else:
		bs = BeautifulSoup(r.text)
		tubelinks = bs.findAll("a", attrs={'href':re.compile("watch")})
		if len(tubelinks) > 0:
			vidUrl = re.search("https[^&]*", tubelinks[0]['href'])
			vidUrl = requests.utils.unquote(vidUrl.group(0))
			return vidUrl
		else:
			print "No video for {0}".format(songTitle)

if __name__=="__main__":
	with open("./all_pandora_likes", 'r') as inFile:
		for line in inFile:
			videoURL = queryGoogle(line)
			if videoURL is not None:
				queryConverter(videoURL)

PDB for n00bs

PDB is the python debugger, which is very handy for debugging scripts. I use it two ways.

If I’m having a problem with the script, I’ll put in the line

import pdb; pdb.set_trace()

just before where the problem occurs. Once the pdb line is hit, I get the interactive debugger and can start stepping through the program and seeing where it blows up, and what variables are getting set to before that happens.

However, I recently found a very handy second way. I was debugging a script with a curses interface, which cleans up when it exits. Unfortunately, that cleanup means that my terminal gets wiped when something crashes, so instead of a stack trace, I just get dumped back to the terminal when something goes wrong, with no information at all left on the screen.

Invoking the script with

python -m pdb ./my_script.py

gets me the postmortem debugger, so when something goes wrong, the program halts and I get the interactive debugger and some amount of stack trace. It’s messy looking because of curses, but I can at least see what is going on.

Playatech started charging for their plans

Unfortunately for burners, you can no longer download Playatech’s plans for their furniture without paying them first. They used to offer the plans as free downloads, and then asked that you donate some small amount if you used them.

Unfortunately for Playatech, they left all the PDFs in a world-readable directory. The command line below gets the index of that directory, finds all the lines with “pdf” in them, gets the file names out using cut, and then downloads each file.

for file in `wget -qO- http://playatech.com/wp-content/uploads/2013/05/ | grep pdf | cut -d ‘>’ -f 2 | cut -d ‘”‘ -f 2`; do wget http://playatech.com/wp-content/uploads/2013/05/$file; done

Flickr Downloadr that really works

Not my work. Get it here.

It does exactly what it says on the tin. This is letting me close a years-old open loop I had, which is that Flickr had a lot of my photos, but sucked so bad that I didn’t want to reward them with money in order to get my photos back.

As soon as the download is done, that Flickr account is toast.

Well THAT'S messy

for file in ../connections_2014-10-7-1*; do conn="-c ../connections_"`echo $file | cut -d "_" -f 2`; types="-t ../neuron_types_"`echo $file | cut -d "_" -f 2`; locs="-l ../locations_"`echo $file | cut -d "_" -f 2`; ./pickle_to_json.py $conn $types $locs; done

For all the connection files that were generated today, create three variables called “conn”, “types”, and “locs” that have a command line switch path in them generated from a fixed prefix and a cut from the connection file name. Then invoke the script “pickle_to_json.py” with those variables as arguments.

Effectively, the connection, neuron type, and location files are all related by their date, so this makes a single JSON file out of the multiple files. I just didn’t want to run pickle_to_json.py a bunch of times by hand, as that seemed error-prone.

Splitting a CSV file into a bunch of columns

awk -F, '{for(i=1;i<=NF;i++){print $i > "sample"i".csv"}}' yourfile.csv

Does what is says on the tin. Splits your CSV file into a bunch of files, one for each column of the original files. Found here.

I’m using this to pull single channels out of a 60 channel file full of recorded neuron voltages, which I’m then throwing through a little filter test program that I whipped up using this filter library. My main goal is getting rid of 60Hz line noise, but the fluorescent bulbs in the room apparently also make noise at 180Hz and 300Hz.

Useful mencoder invocation

mencoder -nosound mf://*.jpg -mf w=1280:h=800:type=jpg:fps=30 -ovc lavc -lavcopts vcodec=mpeg4:vbitrate=2400:mbd=2:keyint=132:v4mv:vqmin=3:lumi_mask=0.07:dark_mask=0.2:mpeg_quant:scplx_mask=0.1:tcplx_mask=0.1:naq -o output_filename.avi

Turns all the JPEG files in the directory you are currently in into a nice quality MPEG-4/AVI file. The width and height in the options after -mf should be changed to match the images. This command line also works for PNG files if you replace both instances of “jpg” with “png”.

Naming things and "ImportError: No module named msg"

I’m using ROS at school for a project. Part of the project is to detect someone’s hand with a camera, so I’m just looking for a patch of “skin colored*” pixels. ROS organizes software as packages, with nodes in them, and messages that the nodes use to communicate with each other.

For my system, I had a package called “hand_detector” with a source file called “hand_detector.py” and a message type called “hand”. ROS generates the messages, which I then import into my python code with the line:

from hand_detector.msg import *

This gets me the error message: “ImportError: No module named msg”

The reason for this is that python searches the same directory as the executing script for imports before it goes looking anywhere else. Since the file hand_detector.py is the executing script, and is naturally in the directory with itself, python finds it there, imports it into itself, and then tries to find a module called “msg” within hand_detector.py. There’s no .msg in there, so I get the error.

The moral of the story here is don’t name your package and the script in it the same thing. Once I converted the script to just “detector.py”, the problem went away.

*I’m somewhat concerned that I wrote a “white people detector”, as it’s really just thresholding the H part of the HSV color space and counting pixels. Other color spaces may be better for this, but this doesn’t have to be perfect. I just don’t want the robot to be a dick to black people.