Category: Computer Vision

ROS and OpenCV will fite u, m8

I recently wanted to do some computer vision stuff using OpenFace, which is a collection of face-processing computer vision algorithms and tools to use them. It uses OpenCV 3.0.something, which uses, among other things, vtk6, and friends libvtk6-dev and python-vtk6.

Normally, this wouldn’t be a problem, but I use ROS Indigo, as does the lab I work in. ROS Indigo uses some previous version of vtk, and so attempting to install OpenCV 3.0 blows away my ROS install, and makes apt freak out when I try to install it again. The actual error was something like “you have broken held packages”, only I didn’t actually have held packages OR broken packages.

Apt just gives up at this point. Aptitude, on the other hand, proposes removing the offending VTK packages and proceeding with the ROS install. Only time will tell if I’ve trashed my OpenCV install, but if I have, I can just go back to an older OpenCV version.

Structured Light Scanner

2015-10-05 23.15.31

I got the projector back in the spring, planning to do some projection mapping. I got distracted by some stuff for a festival, but some co-workers and I were talking about structured light scanning recently, so I threw this together.

For software, I’m probably going to do the calibration stuff in Python and OpenCV. I’d like the first demo to be perspective correction, to project on non-flat surfaces. Second will probably be virtual lighting.

That's Not Helping, Python.

[ERROR] [WallTime: 1384556730.822164] bad callback: >
Traceback (most recent call last):
File "/opt/ros/groovy/lib/python2.7/dist-packages/rospy/", line 681, in _invoke_callback
File "./", line 81, in callback
File "./", line 68, in getBoard
newImg = cv2.warpPerspective(cvImg, perMat)
TypeError: Required argument 'dsize' (pos 3) not found

[ERROR] [WallTime: 1384556763.964408] bad callback: >
Traceback (most recent call last):
File "/opt/ros/groovy/lib/python2.7/dist-packages/rospy/", line 681, in _invoke_callback
File "./", line 81, in callback
File "./", line 68, in getBoard
newImg = cv2.warpPerspective(cvImg, perMat, newImg.shape)
TypeError: function takes exactly 2 arguments (3 given)

Well which is it? Exactly two arguments, or the third positional argument is required?

The real problem is that the shape of newImg is a 3-tuple, and warpPerspective expects a 2-tuple. This StackExchange post hipped me to the bug.

A Framework for Flow-based Coding

Normally, I think visual presentations of programming languages, such as Labview, are more a problem than a solution, but there is one case where I don’t think that’s the case: when the program is operating on a stream of data, and the program should be able to be changed without stopping the stream. The canonical use case for this, in my opinion, is realtime operation on a video stream. In this case, you can watch the video output change in realtime as operations are added, removed, and modified.

What I was hoping to do, at some point, is to write a set of operations for video that are wrappers for e.g. GStreamer, and by putting them in a visual programming framework. That would give me a set of VJing operations that can be played with in realtime to do things like chromakeying a live video stream on human skin colors or dropping swirling masks on all the faces detected in the video stream.

Pyqtgraph has a lot of promise as a framework for this. My main desire for the framework is that I don’t have to deal with things like handling mouse clicks, and can just get a description of the pipeline and shove data through it. It may be someone already did this, but is dead, so maybe it doesn’t matter if they did.

Accidental Aesthetics

Works from a “school” of art share some common elements. Looking at paintings by Dali, Magritte, and Breton, one can say that they share something that is not shared with a Monet. People not trained in the academic study of art might have a hard time naming or articulating that quality, but it is definitely present.

The artists named above are all painters. If one wants to get truly pedantic, it’s possible to claim that their works all have the common quality “flat surface covered by pigments mixed with a binder”. The actual common quality is more a matter of their treatment of form, especially in relation to the expected juxtaposition of forms in the real world, and their engagement with the representation of the unconscious world, that is to say, the realms of dream, delusion, and insanity, as well as direct handling of the duality of representation and reality.

From the fact that this common quality does not directly relate to the material used, we can infer that there can exist works that do not use the same material, and yet have the same quality. This inference is supported by the existance of surrealist sculpture.

However, some materials and creative processes force a certain common developmental aesthetic. Three cases of a unified aesthetic that is incidental to the product, but nonetheless shared, are: the textures used in 3D modeling, the debug output of computer vision systems, and the appearance of DIY/prototyped electromechanical devices from the current generation of hacker spaces.

These aesthetics are unified within themselves, but they are not of a piece with each other. Textures adopt the form that they do because the technology demands it. The technology is defined, and the aesthetic is fully constrained by it. Computer vision systems develop their aesthetic because they must map the world through the system’s understanding into a form that is understood by the human user. The technology is not fully defined, but the system is confined on on three fronts: The input of the real world, the representation available in the system, and what users can “read” in realtime. Prototyped devices have the fewest constraints. The technology is incompletely defined, and the form of it is also undefined, so it is shaped by expedience and available tools. It is the most accidental aesthetic, because it is the one that forms when no other aesthetic is selected.


This is an example of a texture for a human head from here. The distortion would be corrected by remapping onto a model of a human head.

Textures are the most rigidly constrained accidental aesthetic. This description comes from a common modeling file format, but the technology is similar across many modeling processes. The model consists of three files. The first file, the model file, describes the 3D points that make up the surfaces of the model. It also includes a reference to the second file, which is a material file. The material file describes a set of materials that the object is made of, and how light interacts with them. Each material may refer to a third file, which is the texture. A texture is a flat image file. Regions of the flat image file are mapped onto surfaces of the model by a one-to-one (usually) mapping from vertices on the model to vertices on the texture. The vertices on the texture define a shape which is then “cut out” and “applied” to the corresponding shape on the model. Because of the way this works, and the tools used to create this mapping, the texture is frequently a flat representation of the 3D object, in much the way a map of the earth is a flat representation of the 3D world.

Altering the texture would result in changes to its display on the model, so the texture is completely constrained by the model. Because it is a flat image file, the texture is also constrained in the ways that it can be displayed to the user. Because of this complete constraint, the textures display a very strong unity of aesthetic.

Robot readable world from Timo on Vimeo.

Robot Readable World is a compilation of the debugging output of computer vision algorithms. The computer system operates on the video stream to produce data streams which are not visible to humans. These video outputs are intended to allow human debuggers to determine what the system “sees”, that is, to map the data structures into human-readable form and present it mixed with the incoming images so that the person can relate from real objects to the system’s “perception”. Because these are merely explanations of the state of the system, rather than a key part of its functioning, they can be altered and rearranged to provide the maximally useful representation for human readers. The data underneath may not change, but the presentation can be altered.

As a result, these systems are unconstrained at at least one end, the presentation to the user. However, they are constrained at the other end to operate on images. The images are in turn, constrained by the postions and relations of objects in the real world. A computer vision system that operates in a made-up or simulated environment would have no practical use to humans unless they also inhabited that environment. This is not to say that this is not done, as vision approaches could be used in video games, but it is less likely.


This dog treat dispenser is an example of the third accidental aesthetic: the design of DIY electronics. Some hallmarks of this aesthetic are the exposed circuit boards, the surface texturing of 3D printed or laser cut (in this case, 3D printed) parts, visible and accessible wiring, and the use of visible, commercially available screws and other connectors.


This project, a controller for a coffee roaster, has the same aesthetic, despite being constructed by a different person, unknown to the maker of the dog treat dispenser.

This is the least constrained of the three accidental aesthetics. The maker can choose the parts used to create the device, and the form of the finished device. However, the tools available to the user to create the device will drive certain decisions in its eventual form. A 3D printer provides a way to quickly create certain forms, but has a distinct material, texture, and color for those forms. Laser cutting allows a form to be built from layers of flat materials, but again, some building techniques work better than others. Off the shelf commercial components have to be connected together, which leads to visible wires. All of these decisions, to print or not print, laser or not laser, wire or make PCBs have a bias in them that each artist/creator navigates, and the sequence of the decisions leads to a particular aesthetic for the piece.

More Deskewing Rectangles with OpenCV

Someone recently posted to a mailing list that I read, asking for a program that can rotate and crop a lighter-colored rectangle on a black background. Since this fits in with the program I’m working on to locate, rotate, and deskew an image of a card in a photo, I figured I’d give it a shot. I’m planning to do both rotation and deskewing, but for now I’m only doing the rotation.

I’ve installed the latest OpenCV (version 2.3, as of this writing), and started learning the new Python bindings.  There seem to be a lot of cases where functions want a numpy array as an argument, but the functions don’t return numpy arrays, so there is a lot of fiddling around with the results of functions to get something that can be passed to another function.

On top of that, the return values of functions are poorly or rarely documented. Python’s duck-typing can help out a little in these cases, but if you call cv2.minAreaRect, you get something like ((233, 432), (565, 420), -0.2343), which is described in the documentation with the single word “retval”.  It would be helpful to have a way to find out that the first tuple is the center, the second is the width and height, and the third is the tilt (in degrees) of the “first” edge off of horizontal. That tuple makes just as much sense, but is wrong, when interpreted as the top left and bottom right corners and a tilt measured in radians.

Also, the “first” edge is seemingly arbitrary, or at least I can’t find any documentation describing it. This means that the same rectangle could be off by 0.4 degrees or by -89.6 degrees, depending on if the first edge is a horizontal or vertical edge. One thing that may be helpful is this stackoverflow post. Since the rectangle is defined as points, I can reshuffle the points to get them in a consistent order, and then get the angle off of horizontal for a consistent edge. That then goes into producing the transformation matrix for affine transforms (e.g. rotation and deskewing).

The minAreaRect() call gets me the angle I can use to rotate the image, and this trick should get me the four corners of the perspective-skewed image that can be straightened out to get the squared image.

OpenCv and finding rectangles

I have been working, on and off, on a computer vision application that will recognize a card from a certain game in an image, figure out what the card is, and add it to a database. I had some luck in earlier versions of the code looking for template matches to try to find distinctive card elements, but that fails if the card is scaled or skewed, and it rapidly becomes too processor-heavy if there are many templates to match. Recently, at work, I have had even more opportunity to play with OpenCV (a computer vision library), and have found a few blogs and tricks that might help me out.

The first blog shows how to pick out a Sudoku puzzle from a picture. The most important part is finding the corners of the puzzle, as after that, it can be mapped to a square with a perspective transform. I can do a similar trick, only I’ll be mapping to a rectangle. Since corner-finding is kind of scale-invariant (the corner of something is a corner at any scale), this will let me track a card pretty easily.

I think that I can actually use OpenCV’s contour finding to get most of the edges of the card, and then the Hough transform to get the corner points. I may even be able to get away with using just contour finding, getting the bounding rectangle of each contour, and checking that it has something like the proper aspect ratio. This will work in the presence of cards that are rotated, but fails on perspective-related skewing.

This StackOverflow post has a nice approach to getting the corners of a rectangle that has some rotation and perspective skew.

Once I have the card located, I’m going to throw a cascade of classifiers at it and try something like AdaBoost to get a good idea of which card it is. Some of the classifiers are simple, things like determining the color of the front of the card. Others may actually pull in a bit of OCR or template-based image recognition on (tiny) subsections of the card. Since I will actually know the card border at this point, I can scale the templates to match the card, and get solid matches fast.

Displaying contours in OpenCV

Suppose you have used FindContours to find the outlines of things in an image, and now want to color each one a different color, perhaps in order to create a seed map for the watershed algorithm. This gets you an image with each area that FindContours found in a different color:

#Showing the contours.
#contour_list is the output of cv.FindContours(), degapped is the image contours were found in
contour_img = cv.CreateImage(cv.GetSize(degapped), IPL_DEPTH_8U, 3)
contour = contour_list
while contour.h_next() != None:
    color = get_rand_rgb(80,255)
    holecolor = get_rand_rgb(80,255)
    cv.DrawContours(contour_img, contour, color, holecolor, -1, CV_FILLED)
    contour = contour.h_next()
show_img(contour_img, "contours " + str(iteration))

The real key here is iterating over the list of contours using h_next(), which took me longer than it should have to find.

The show_img() function is effectively just printf for OpenCV images, and looks like this:

def show_img(img, winName):
    #Debugging printf for images!
    cv.ShowImage(winName, img)

The function get_rand_rgb() gets a random RGB color with values for each color set to an integer in the range you pass it, like so:

def get_rand_rgb(min, max):
    return (random.randint(min,max),random.randint(min,max),random.randint(min,max))

I used 80,255 to get bright colors.

Magic Card Code

Now available at my GitHub repo. Note that using these scripts is probably a horrible violation of Wizards of the Coast’s ToS for their website, and can probably get you banned.

More on recognizing Magic cards

Computer vision is hard.

I have code written that detects edges in a video image, picks pairs of edges that are both within a certain ratio of lengths relative to each other and within a specific angle of each other. I’ll post the code up once I have pushing to my github account set up correctly. It’s actually pretty poor at locating cards in a video image, but it is a demo of how to use some of the UI and feature recognition stuff from OpenCV.

From here, I have a few ideas about how to proceed. The first is to have the program display a rectangle and ask the user to align the card to that rectangle. This means the program will not have to find the card in the image, and can focus on recognizing parts of the image. This will use template-based recognition to pick out mana symbols, and should be pretty simple. The classifier that tries to pick out colors will be even easier, as it will just select a sample region, blur it, and match it to precomputed colors. This can be calibrated for specific lighting conditions for each run.

An even less hands-on approach was suggested by my co-worker Eric. The cards could be displayed by a machine that essentially deals one card at a time. The machine would have a magazine of cards, and would photograph a card, deal it into a hopper, photograph the next card, and so forth, until it was empty. This would have the problem that it wouldn’t be something anyone could download and use, as it would require a special card-handling machine, but the software end would be compatible with the user-based registration approach described above, so someone with a lot of patience could use the manual method.

Another approach would be to throw all the cards on a flatbed scanner (possibly with a funny-colored (e.g. red) bed lid, do template matching to locate card borders, and then segment the images based on that. An ADF scanner with a business card attachment could probably also make short work of a modestly-sized set of cards.