Month: October 2011
Someone recently posted to a mailing list that I read, asking for a program that can rotate and crop a lighter-colored rectangle on a black background. Since this fits in with the program I’m working on to locate, rotate, and deskew an image of a card in a photo, I figured I’d give it a shot. I’m planning to do both rotation and deskewing, but for now I’m only doing the rotation.
I’ve installed the latest OpenCV (version 2.3, as of this writing), and started learning the new Python bindings. There seem to be a lot of cases where functions want a numpy array as an argument, but the functions don’t return numpy arrays, so there is a lot of fiddling around with the results of functions to get something that can be passed to another function.
On top of that, the return values of functions are poorly or rarely documented. Python’s duck-typing can help out a little in these cases, but if you call cv2.minAreaRect, you get something like ((233, 432), (565, 420), -0.2343), which is described in the documentation with the single word “retval”. It would be helpful to have a way to find out that the first tuple is the center, the second is the width and height, and the third is the tilt (in degrees) of the “first” edge off of horizontal. That tuple makes just as much sense, but is wrong, when interpreted as the top left and bottom right corners and a tilt measured in radians.
Also, the “first” edge is seemingly arbitrary, or at least I can’t find any documentation describing it. This means that the same rectangle could be off by 0.4 degrees or by -89.6 degrees, depending on if the first edge is a horizontal or vertical edge. One thing that may be helpful is this stackoverflow post. Since the rectangle is defined as points, I can reshuffle the points to get them in a consistent order, and then get the angle off of horizontal for a consistent edge. That then goes into producing the transformation matrix for affine transforms (e.g. rotation and deskewing).
The minAreaRect() call gets me the angle I can use to rotate the image, and this trick should get me the four corners of the perspective-skewed image that can be straightened out to get the squared image.
I have been working, on and off, on a computer vision application that will recognize a card from a certain game in an image, figure out what the card is, and add it to a database. I had some luck in earlier versions of the code looking for template matches to try to find distinctive card elements, but that fails if the card is scaled or skewed, and it rapidly becomes too processor-heavy if there are many templates to match. Recently, at work, I have had even more opportunity to play with OpenCV (a computer vision library), and have found a few blogs and tricks that might help me out.
The first blog shows how to pick out a Sudoku puzzle from a picture. The most important part is finding the corners of the puzzle, as after that, it can be mapped to a square with a perspective transform. I can do a similar trick, only I’ll be mapping to a rectangle. Since corner-finding is kind of scale-invariant (the corner of something is a corner at any scale), this will let me track a card pretty easily.
I think that I can actually use OpenCV’s contour finding to get most of the edges of the card, and then the Hough transform to get the corner points. I may even be able to get away with using just contour finding, getting the bounding rectangle of each contour, and checking that it has something like the proper aspect ratio. This will work in the presence of cards that are rotated, but fails on perspective-related skewing.
This StackOverflow post has a nice approach to getting the corners of a rectangle that has some rotation and perspective skew.
Once I have the card located, I’m going to throw a cascade of classifiers at it and try something like AdaBoost to get a good idea of which card it is. Some of the classifiers are simple, things like determining the color of the front of the card. Others may actually pull in a bit of OCR or template-based image recognition on (tiny) subsections of the card. Since I will actually know the card border at this point, I can scale the templates to match the card, and get solid matches fast.
Suppose you have used FindContours to find the outlines of things in an image, and now want to color each one a different color, perhaps in order to create a seed map for the watershed algorithm. This gets you an image with each area that FindContours found in a different color:
#Showing the contours. #contour_list is the output of cv.FindContours(), degapped is the image contours were found in contour_img = cv.CreateImage(cv.GetSize(degapped), IPL_DEPTH_8U, 3) contour = contour_list while contour.h_next() != None: color = get_rand_rgb(80,255) holecolor = get_rand_rgb(80,255) cv.DrawContours(contour_img, contour, color, holecolor, -1, CV_FILLED) contour = contour.h_next() show_img(contour_img, "contours " + str(iteration))
The real key here is iterating over the list of contours using h_next(), which took me longer than it should have to find.
The show_img() function is effectively just printf for OpenCV images, and looks like this:
def show_img(img, winName): #Debugging printf for images! cv.NamedWindow(winName) cv.ShowImage(winName, img) cv.WaitKey(0) cv.DestroyWindow(winName)
The function get_rand_rgb() gets a random RGB color with values for each color set to an integer in the range you pass it, like so:
def get_rand_rgb(min, max): return (random.randint(min,max),random.randint(min,max),random.randint(min,max))
I used 80,255 to get bright colors.