Category: Computer Vision

On Looking into Mouse Sensors

That is to say, the sensors from optical mice, rather than a sensor intended to detect the small rodent.

I have 10 boards from optical mice and three desoldered sensors. Among that bunch there are two common IC packages, a 16-pin staggered DIP (4 units) and an 8-pin staggered DIP (6 units). There is also a 20-pin staggered DIP, and two 12-pin DIP packages.

Most of the chips were made by Agilent, or Avago, a spin-off of Agilent that eventually bought Broadcom and started operating under that name. A couple are from At Lab, or as they style themselves “@lab”.

The chip interfaces are very heterogeneous. Some of them just output PS2 data and clock signals, and so are a very integrated mouse IC. Some of them output quadrature signals for x and y motion.

I had high hopes for using these mouse sensors for a couple of hacks. One of them is that they are essentially optical flow processors, so you can use them to either get velocity based on the observed motion of stationary objects from a moving platform, assuming you know how far away the objects are (and so get odometry for a moving robot by watching the ground roll by). The inverse of that is that you can also get how far away an object is assuming that it is stationary and you know your own speed (for height over ground detection in a drone, for example).

Ultimately, though, I don’t think this stash of ICs is going to do the job I want. What I want is something I can drop into projects I’m working on, and reverse engineering each of these, finding the datasheets for ICs old enough to support PS2 protocol, and so forth, would be its own hassle of a project. USB optical mice are $7 or so, so I can’t really justify the effort to get these working, sort out optics for them, etc.

On top of that, drone optical flow sensors with the optics already sorted are like $10-20, so for that use case, I can just buy the part. For robot odometry, I can use the same part, or put optics on a USB mouse that can actually plug into a recent computer, instead of decoding quadrature or PS2.

It feels kind of weird to pick up one of my old projects that I had been kind of looking forward to, and realize that it’s simply not useful or interesting, but I guess that’s just how it goes. At least I can free up that parts drawer now!

Let’s make a 3D rendering of a forest!

Let’s do it with a video shot from a GoPro by someone walking around a camp site!

I’m not actually sure this is all that easy, or even possible. The main way to do this sort of rendering from a monocular camera is structure from motion (SfM), which relies on SIFT features in the image, and from what I recall of SIFT features, I’m not 100% convinced that they’re going to track very well from frame to frame, because the frames are mostly full of trees, and leaves look a lot like each other. On the other hand, there’s plenty of texture in the image, so maybe it will work just fine.

I’m planning to use Visual SfM, for which there are instructions online. Those instructions are for Ubuntu 12.04, but I’m on 16.04, so there are some modifications required. I got and ran the latest CUDA installer, using the local install rather than the network one because NVidia broke the network one somehow. I suspect a bad URL in the step where you install the keys.

I got what I think are the dependencies with the command

sudo apt-get install libgtk2.0-dev glew-utils libdevil-dev libboost-all-dev libatlas-cpp-0.6-dev libatlas-dev imagemagick libatlas

The build for VSfM was so quick I thought something had gone wrong, but it did build.

The SiftGPU link in the instructions is dead, but this looks legit. The make process threw some warnings at me, but no errors.

Rather than copying the library once it’s built, I linked it with ln -s ../../SiftGPU/bin/libsiftgpu.so ./libsiftgpu.so, we’ll see if this comes back to haunt me later.

I got PBA from the suggested place, and made it without adding the include for stdlib.h, since it’s 1.05 rather than 1.04 now, and perhaps they fixed that. It seems good, as it built (although with some warnings).

I went ahead and applied the mylapack hack to pvms-2, it seems to have gone well.

cd pmvs-2/program/main/
mv mylapack.o mylapack.o.bak
make clean
mv mylapack.o.bak mylapack.o
make depend
make

Graclus 1.2 built just fine (again with the warnings, though). Read the readme, it has how to set the number of bits used, and if you’re not on a laughably ancient machine, the value you want is 64.

I installed the cmvs from here, you want the one called cmvs-fix2.tar.gz, because non-fix versions seem to have issues with finding lapack.

After that I had to reboot, because everything that built was linked against CUDA and video drivers that I wasn’t running, and so it all crashed when it tried to render anything in X. After rebooting, everything came up fine and I can run VisualSFM, so the next part of this will be getting the video out of the GoPro and feeding it to VisualSFM.

Configuring video cropping and resizing with ROS’s image_proc

The existing documentation on this seemed a little sparse, so I figured I’d post up an example based on what I figured out for my project.

I have a 1024×768 sensor_msgs/Image coming out of image rectification on the topic /overhead_cam/image_rect, with camera info on the predictably-named topic /overhead_cam/camera_info. What I want is to crop off the top 120 px of that image so that it’s the right aspect ratio to get scaled to 1680×1050 without distortion (technically, scaling is a distortion, but it’s the only distortion I want). Then I want to scale it up to 1680×1050.

<!-- Video cropping -->
<node pkg="nodelet" type="nodelet" args="standalone image_proc/crop_decimate" name="crop_img">
  <param name="x_offset" type="int" value="0" />
  <param name="y_offset" type="int" value="120" />
  <param name="width" type="int" value="1024" />
  <param name="height" type="int" value="684" />

  <!-- remap input topics -->
  <remap from="camera/image_raw" to="overhead_cam/image_rect_color"/>
  <remap from="camera/image_info" to="overhead_cam/camera_info"/>

  <!-- remap output topics -->
  <remap from="camera_out/image_raw" to="camera_crop/image_rect_color"/>
  <remap from="camera_out/image_info" to="camera_crop/camera_info"/>
</node>

<!-- Video resizing -->
<node pkg="nodelet" type="nodelet" args="standalone image_proc/resize" name="resize_img">
  <!-- remap input topics -->
  <remap from="image" to="camera_crop/image_rect_color"/>
  <remap from="camera_info" to="camera_crop/camera_info"/>

  <!-- remap output topics -->
  <remap from="resize_image/image" to="camera_resize/image_rect_color"/>
  <remap from="resize_image/camera_info" to="camera_resize/camera_info"/>
</node>

<!-- Dynamic reconfigure the resizing nodelet -->
<node name="$(anon dynparam)" pkg="dynamic_reconfigure" type="dynparam" args="set_from_parameters resize_img">
  <param name="use_scale" type="int" value="0" />
  <param name="width" type="int" value="1680" />
  <param name="height" type="int" value="1050" />
 </node>

The first bit does the cropping, running the decimate nodelet as a standalone node and doing the cropping. It can also decimate (shrink images by deleting rows/cols), but I’m not doing that.

The second bit starts up the video resizing nodelet as standalone, and the third bit fires off a dynamic reconfigure signal to set the image size. If you set use_scale to true, it will scale an image by some percentage, which isn’t quite what I wanted.

I imagine it’s possible to fold these into image rectification as nodelets (image rectification is itself a debayer nodelet and two rectify nodelets, one for the mono image and one for the color image), but I didn’t look into that because this worked for me.

 

ROS and OpenCV will fite u, m8

I recently wanted to do some computer vision stuff using OpenFace, which is a collection of face-processing computer vision algorithms and tools to use them. It uses OpenCV 3.0.something, which uses, among other things, vtk6, and friends libvtk6-dev and python-vtk6.

Normally, this wouldn’t be a problem, but I use ROS Indigo, as does the lab I work in. ROS Indigo uses some previous version of vtk, and so attempting to install OpenCV 3.0 blows away my ROS install, and makes apt freak out when I try to install it again. The actual error was something like “you have broken held packages”, only I didn’t actually have held packages OR broken packages.

Apt just gives up at this point. Aptitude, on the other hand, proposes removing the offending VTK packages and proceeding with the ROS install. Only time will tell if I’ve trashed my OpenCV install, but if I have, I can just go back to an older OpenCV version.

Structured Light Scanner

2015-10-05 23.15.31

I got the projector back in the spring, planning to do some projection mapping. I got distracted by some stuff for a festival, but some co-workers and I were talking about structured light scanning recently, so I threw this together.

For software, I’m probably going to do the calibration stuff in Python and OpenCV. I’d like the first demo to be perspective correction, to project on non-flat surfaces. Second will probably be virtual lighting.

That's Not Helping, Python.

[ERROR] [WallTime: 1384556730.822164] bad callback: >
Traceback (most recent call last):
File "/opt/ros/groovy/lib/python2.7/dist-packages/rospy/topics.py", line 681, in _invoke_callback
cb(msg)
File "./board_finder.py", line 81, in callback
self.finder.getBoard(data)
File "./board_finder.py", line 68, in getBoard
newImg = cv2.warpPerspective(cvImg, perMat)
TypeError: Required argument 'dsize' (pos 3) not found

[ERROR] [WallTime: 1384556763.964408] bad callback: >
Traceback (most recent call last):
File "/opt/ros/groovy/lib/python2.7/dist-packages/rospy/topics.py", line 681, in _invoke_callback
cb(msg)
File "./board_finder.py", line 81, in callback
self.finder.getBoard(data)
File "./board_finder.py", line 68, in getBoard
newImg = cv2.warpPerspective(cvImg, perMat, newImg.shape)
TypeError: function takes exactly 2 arguments (3 given)

Well which is it? Exactly two arguments, or the third positional argument is required?

The real problem is that the shape of newImg is a 3-tuple, and warpPerspective expects a 2-tuple. This StackExchange post hipped me to the bug.

A Framework for Flow-based Coding

Normally, I think visual presentations of programming languages, such as Labview, are more a problem than a solution, but there is one case where I don’t think that’s the case: when the program is operating on a stream of data, and the program should be able to be changed without stopping the stream. The canonical use case for this, in my opinion, is realtime operation on a video stream. In this case, you can watch the video output change in realtime as operations are added, removed, and modified.

What I was hoping to do, at some point, is to write a set of operations for video that are wrappers for e.g. GStreamer, and by putting them in a visual programming framework. That would give me a set of VJing operations that can be played with in realtime to do things like chromakeying a live video stream on human skin colors or dropping swirling masks on all the faces detected in the video stream.

Pyqtgraph has a lot of promise as a framework for this. My main desire for the framework is that I don’t have to deal with things like handling mouse clicks, and can just get a description of the pipeline and shove data through it. It may be someone already did this, but firtree.org is dead, so maybe it doesn’t matter if they did.

Accidental Aesthetics

Works from a “school” of art share some common elements. Looking at paintings by Dali, Magritte, and Breton, one can say that they share something that is not shared with a Monet. People not trained in the academic study of art might have a hard time naming or articulating that quality, but it is definitely present.

The artists named above are all painters. If one wants to get truly pedantic, it’s possible to claim that their works all have the common quality “flat surface covered by pigments mixed with a binder”. The actual common quality is more a matter of their treatment of form, especially in relation to the expected juxtaposition of forms in the real world, and their engagement with the representation of the unconscious world, that is to say, the realms of dream, delusion, and insanity, as well as direct handling of the duality of representation and reality.

From the fact that this common quality does not directly relate to the material used, we can infer that there can exist works that do not use the same material, and yet have the same quality. This inference is supported by the existance of surrealist sculpture.

However, some materials and creative processes force a certain common developmental aesthetic. Three cases of a unified aesthetic that is incidental to the product, but nonetheless shared, are: the textures used in 3D modeling, the debug output of computer vision systems, and the appearance of DIY/prototyped electromechanical devices from the current generation of hacker spaces.

These aesthetics are unified within themselves, but they are not of a piece with each other. Textures adopt the form that they do because the technology demands it. The technology is defined, and the aesthetic is fully constrained by it. Computer vision systems develop their aesthetic because they must map the world through the system’s understanding into a form that is understood by the human user. The technology is not fully defined, but the system is confined on on three fronts: The input of the real world, the representation available in the system, and what users can “read” in realtime. Prototyped devices have the fewest constraints. The technology is incompletely defined, and the form of it is also undefined, so it is shaped by expedience and available tools. It is the most accidental aesthetic, because it is the one that forms when no other aesthetic is selected.

paladin_head

This is an example of a texture for a human head from here. The distortion would be corrected by remapping onto a model of a human head.

Textures are the most rigidly constrained accidental aesthetic. This description comes from a common modeling file format, but the technology is similar across many modeling processes. The model consists of three files. The first file, the model file, describes the 3D points that make up the surfaces of the model. It also includes a reference to the second file, which is a material file. The material file describes a set of materials that the object is made of, and how light interacts with them. Each material may refer to a third file, which is the texture. A texture is a flat image file. Regions of the flat image file are mapped onto surfaces of the model by a one-to-one (usually) mapping from vertices on the model to vertices on the texture. The vertices on the texture define a shape which is then “cut out” and “applied” to the corresponding shape on the model. Because of the way this works, and the tools used to create this mapping, the texture is frequently a flat representation of the 3D object, in much the way a map of the earth is a flat representation of the 3D world.

Altering the texture would result in changes to its display on the model, so the texture is completely constrained by the model. Because it is a flat image file, the texture is also constrained in the ways that it can be displayed to the user. Because of this complete constraint, the textures display a very strong unity of aesthetic.

Robot readable world from Timo on Vimeo.

Robot Readable World is a compilation of the debugging output of computer vision algorithms. The computer system operates on the video stream to produce data streams which are not visible to humans. These video outputs are intended to allow human debuggers to determine what the system “sees”, that is, to map the data structures into human-readable form and present it mixed with the incoming images so that the person can relate from real objects to the system’s “perception”. Because these are merely explanations of the state of the system, rather than a key part of its functioning, they can be altered and rearranged to provide the maximally useful representation for human readers. The data underneath may not change, but the presentation can be altered.

As a result, these systems are unconstrained at at least one end, the presentation to the user. However, they are constrained at the other end to operate on images. The images are in turn, constrained by the postions and relations of objects in the real world. A computer vision system that operates in a made-up or simulated environment would have no practical use to humans unless they also inhabited that environment. This is not to say that this is not done, as vision approaches could be used in video games, but it is less likely.

dog_treat_dispenser

This dog treat dispenser is an example of the third accidental aesthetic: the design of DIY electronics. Some hallmarks of this aesthetic are the exposed circuit boards, the surface texturing of 3D printed or laser cut (in this case, 3D printed) parts, visible and accessible wiring, and the use of visible, commercially available screws and other connectors.

Nahman_laser_box4

This project, a controller for a coffee roaster, has the same aesthetic, despite being constructed by a different person, unknown to the maker of the dog treat dispenser.

This is the least constrained of the three accidental aesthetics. The maker can choose the parts used to create the device, and the form of the finished device. However, the tools available to the user to create the device will drive certain decisions in its eventual form. A 3D printer provides a way to quickly create certain forms, but has a distinct material, texture, and color for those forms. Laser cutting allows a form to be built from layers of flat materials, but again, some building techniques work better than others. Off the shelf commercial components have to be connected together, which leads to visible wires. All of these decisions, to print or not print, laser or not laser, wire or make PCBs have a bias in them that each artist/creator navigates, and the sequence of the decisions leads to a particular aesthetic for the piece.

More Deskewing Rectangles with OpenCV

Someone recently posted to a mailing list that I read, asking for a program that can rotate and crop a lighter-colored rectangle on a black background. Since this fits in with the program I’m working on to locate, rotate, and deskew an image of a card in a photo, I figured I’d give it a shot. I’m planning to do both rotation and deskewing, but for now I’m only doing the rotation.

I’ve installed the latest OpenCV (version 2.3, as of this writing), and started learning the new Python bindings.  There seem to be a lot of cases where functions want a numpy array as an argument, but the functions don’t return numpy arrays, so there is a lot of fiddling around with the results of functions to get something that can be passed to another function.

On top of that, the return values of functions are poorly or rarely documented. Python’s duck-typing can help out a little in these cases, but if you call cv2.minAreaRect, you get something like ((233, 432), (565, 420), -0.2343), which is described in the documentation with the single word “retval”.  It would be helpful to have a way to find out that the first tuple is the center, the second is the width and height, and the third is the tilt (in degrees) of the “first” edge off of horizontal. That tuple makes just as much sense, but is wrong, when interpreted as the top left and bottom right corners and a tilt measured in radians.

Also, the “first” edge is seemingly arbitrary, or at least I can’t find any documentation describing it. This means that the same rectangle could be off by 0.4 degrees or by -89.6 degrees, depending on if the first edge is a horizontal or vertical edge. One thing that may be helpful is this stackoverflow post. Since the rectangle is defined as points, I can reshuffle the points to get them in a consistent order, and then get the angle off of horizontal for a consistent edge. That then goes into producing the transformation matrix for affine transforms (e.g. rotation and deskewing).

The minAreaRect() call gets me the angle I can use to rotate the image, and this trick should get me the four corners of the perspective-skewed image that can be straightened out to get the squared image.

OpenCv and finding rectangles

I have been working, on and off, on a computer vision application that will recognize a card from a certain game in an image, figure out what the card is, and add it to a database. I had some luck in earlier versions of the code looking for template matches to try to find distinctive card elements, but that fails if the card is scaled or skewed, and it rapidly becomes too processor-heavy if there are many templates to match. Recently, at work, I have had even more opportunity to play with OpenCV (a computer vision library), and have found a few blogs and tricks that might help me out.

The first blog shows how to pick out a Sudoku puzzle from a picture. The most important part is finding the corners of the puzzle, as after that, it can be mapped to a square with a perspective transform. I can do a similar trick, only I’ll be mapping to a rectangle. Since corner-finding is kind of scale-invariant (the corner of something is a corner at any scale), this will let me track a card pretty easily.

I think that I can actually use OpenCV’s contour finding to get most of the edges of the card, and then the Hough transform to get the corner points. I may even be able to get away with using just contour finding, getting the bounding rectangle of each contour, and checking that it has something like the proper aspect ratio. This will work in the presence of cards that are rotated, but fails on perspective-related skewing.

This StackOverflow post has a nice approach to getting the corners of a rectangle that has some rotation and perspective skew.

Once I have the card located, I’m going to throw a cascade of classifiers at it and try something like AdaBoost to get a good idea of which card it is. Some of the classifiers are simple, things like determining the color of the front of the card. Others may actually pull in a bit of OCR or template-based image recognition on (tiny) subsections of the card. Since I will actually know the card border at this point, I can scale the templates to match the card, and get solid matches fast.