Curious Effects Getting List Extents

I have a program that gets a list of GPS waypoints, and wants to figure out their bounding box. The naive way[1] to do this is find the maximum and minimum latitude and longitude, and use the maxes as one corner and the minimums as the other corner.

Off the top of my head, I can think of two ways to do this: Iterate the list of waypoints, comparing to the max and minimum so far, and updating as I go. The list has N points, I have to look at all of them, so O(N), so far so good.

The other way to do it is to do list comprehensions to get the latitudes and longitudes as separate lists, and then call max() and min() on each of those. I would assume that each list comprehension is O(N), and each call to max() or min() is also O(N), since they have to look through the whole list to find the maximum or minimum, and so it is 6 loops over the list (2 comprehensions, 2 max() calls, 2 min() calls), and so this is the slower way to do it.

It turns out, not so much.

I ran the code below on Repl.it and got, usually, the list comprehension version being just very slightly faster to twice as fast. Occasionally, the 10,000 instance case is slower, but not all the time.

import random 
from timeit import default_timer as timer

#Try some different sizes of lists
for jj in [10, 100, 1000, 10000, 100000, 1000000]:

  #Set up N waypoints
  waypoints = []
  for ii in range(jj):
    lat = (random.random() * 360) - 180
    lon = (random.random() * 360) - 180
    waypoints.append({"lat":lat, "lon":lon})

  start = timer()

  # One loop
  maxLat = maxLon = -float("inf")
  minLat = minLon = float("inf")
  for point in waypoints:
      lat = float(point["lat"])
      if lat < minLat:
          minLat = lat
      if lat > maxLat:
          maxLat = lat
      lon = float(point["lon"])
      if lon < minLon:
          minLon = lon
      if lon > maxLon:
          maxLon = lon

  mid = timer()

  # List comprehensions
  lons = [float(point["lon"]) for point in waypoints]
  lats = [float(point["lat"]) for point in waypoints]
  minLat1 = min(lats)
  minLon1 = min(lons)
  maxLat1 = max(lats)
  maxLon1 = max(lons)

  end = timer()

  #Print the results
  print(f"{jj} points")
  print(f"  first way {mid-start}")
  print(f"  second way {end-mid}")
  print(f"  speedup {(mid-start)/(end-mid)}")
  assert(minLat == minLat1)
  assert(maxLat == maxLat1)
  assert(minLon == minLon1)
  assert(maxLon == maxLon1)

So why is it faster? Clearly, I’m assuming something wrong. I suspect the main thing that I’m assuming wrong is that the constant 6 multiplied by the O(N) matters. It probably doesn’t, and that’s why we typically drop the constant multipliers in runtime comparisons. It’s likely that list comprehensions and max()/min() of iterables are calls to a very fast C implementation, and are just so much faster than my loop in Python that the fact that I’m doing 6 iterations doesn’t really matter.

Another thing that I’m assuming is that max and min are implemented as linear searches over iterables. It’s entirely possible that iterables store references to their maximum and minimum values, and just return that when asked, rather than going looking. I doubt it, since the overhead on removing an element would be large [2], but it is possible.

I haven’t looked into either of these assumptions, since timing the runs answered the question I had (“Which is faster?”), and the follow-on question (“Why?”) isn’t useful to me at this time.

[1] It does some stupid stuff around the poles and across the international date line, for example.

[2] You’ve removed the largest element. What is the new largest? Time to go searching…

Alternatively, the list could be implemented as a parallely-linked-list, where one set of links is the elements in their list order, and the other set is the elements in their sorted order, but then the list [1, 3, 5, “pickles”, <built-in method foo of Bar object at 0x1f34b881>, 6] doesn’t have well-defined links for the sorted order.

Drag and Drop Python Objects in WxPython

I’m working on a UI for a system that has agents, which have some qualities, and units, which have multiple agents. I want to be able to display each unit as a list, and drag agents from that list to other units, to reassign them.

There are a lot of examples for using drag and drop with text and files, because WxPython provides drop targets for text and files already. One way that this could be implemented is to serialize the dragged objects to JSON, drop them as text, and then de-serialize them. This has some disadvantages, notably that you end up restricted by what you can pass via JSON. I wanted to pass Python objects, so I used pickle.

What I eventually came up with is below. It uses ObjectListView, which is a great wrapper around WxPython’s ListCtrl. On drag, the selected items of the source list are pickled and passed through the Wx drag and drop mechanism. When they are dropped on another ObjectListView, they are then unpickled and added to that ObjectListView (OLV), and removed from the source OLV.

One thing that this code does leave up to the programmer is ensuring that what goes in on one side of the drag and drop is the same as what is expected out on the other side. Another, slightly more subtle thing, is that this uses pickle on the drop data, so it would be possible to have a script that generates malicious pickle data, and lets you drag it from another UI window to my script’s OLV, whereupon it unpickles into something nasty.

That said, if your attacker is sitting at your computer, launching malicious programs and dragging and dropping stuff out of them, you have already lost, and should probably invest in better door locks.

#!/usr/bin/env python
import wx
import pickle
from ObjectListView import ObjectListView, ColumnDefn  #pip install objectlistview
import wx.lib.scrolledpanel as scrolled

# UI with a few list boxes that can be drag/dropped between, and have title bars

class Agent(object):
    # Model of a single agent, has:
    # identifier (string?)
    # range (km)
    # speed (kph)
    # capacity (integer)
    def __init__(self, ident, range, speed, capacity):
        self.ident = ident
        self.range = range
        self.speed = speed
        self.capacity = capacity

    def __repr__(self):
        return f"<Agent: {self.ident}>"

# Drag and drop Drop target that supports receiving pickled 
# python data structures and doing something with them. 
class GenericDropTarget(wx.DropTarget):
    def __init__(self, object):
        super(GenericDropTarget, self).__init__()
        self.object = object

        self.data = wx.CustomDataObject("PickleData")
        self.SetDataObject(self.data)

    def OnData(self, x, y, defResult):
        #print(f"OnData({x},{y})")
        
        if self.GetData():
            # unpickle data and do something with it
            pickled_stuff = self.data.GetData()
            cukes = pickle.loads(pickled_stuff)

            # TODO We are making the assumption that a "PickleData"
            # actually only has a list of Agents in it. 
            # Add some checking before making this a real thing, or
            # limit the type to a more-specific format like "AgentList"
            self.object.AddObjects(cukes)

        return defResult

    def OnDrop(self, x, y):
        #print(f"OnDrop({x},{y})")
        return True

    def OnDragOver(self, x, y, defResult):
        #print(f"OnDragOver({x},{y})")
        return defResult

    def OnEnter(self, x, y, defResult):
        #print(f"OnEnter({x},{y})")
        return defResult

class UnitPanel(wx.Panel):
 
    def __init__(self, parent, unitName="No name set"):
        wx.Panel.__init__(self, parent=parent, id=wx.ID_ANY)

        self.dataOlv = ObjectListView(self, wx.ID_ANY, style=wx.LC_REPORT|wx.SUNKEN_BORDER)

        self.dataOlv.SetColumns([
            ColumnDefn("ID", "left", -1, "ident", minimumWidth=100),
            ColumnDefn("Range", "right", -1, "range", minimumWidth=60),
            ColumnDefn("Speed", "right", -1, "speed", minimumWidth=60),
            ColumnDefn("Capacity", "right", -1, "capacity", minimumWidth=60)
        ])
 
        self.agents = []
        self.dataOlv.SetObjects(self.agents)
        self.dataOlv.Bind(wx.EVT_LIST_BEGIN_DRAG, self.OnDragInit)
        
        # Set up a drop target on the listview
        dt = GenericDropTarget(self.dataOlv) 
        self.dataOlv.SetDropTarget(dt) 

        # Set up a title for this box
        self.unitLabel = wx.StaticText(self, id=wx.ID_ANY, label=unitName)

        mainSizer = wx.BoxSizer(wx.VERTICAL)
        mainSizer.Add(self.unitLabel, proportion=0, flag=wx.ALL, border=5)
        mainSizer.Add(self.dataOlv, proportion=1, flag=wx.ALL|wx.EXPAND, border=5)
        self.SetSizer(mainSizer)

    def populate(self, units):
        self.agents = []
        # for unit in units:
        #     self.agents.append(Agent(unit["ident"], unit["range"], unit["speed"], unit["capacity"]))
        # self.dataOlv.SetObjects(self.agents)
        for unit in units:
            a = Agent(unit["ident"], unit["range"], unit["speed"], unit["capacity"])
            self.dataOlv.AddObject(a)
            #self.draggableURLText.Bind(wx.EVT_MOTION, self.OnStartDrag)

    def OnDragInit(self, event): 
        # Get all the selected items from this list
        selected = self.dataOlv.GetSelectedObjects()

        # Pickle them and put them in a custom data object
        pickled_selection = pickle.dumps(selected)
        drag_obj = wx.CustomDataObject("PickleData")
        drag_obj.SetData(pickled_selection) 

        #Create a drag and drop source from this ObjectListView
        src = wx.DropSource(self.dataOlv) 
        src.SetData(drag_obj)
        print("Drag started")
        result = src.DoDragDrop(wx.Drag_DefaultMove) 
        if result == wx.DragCopy:
            # We don't copy because agents are hardware
            self.dataOlv.RemoveObjects(selected)
        elif result == wx.DragMove:
            # Remove the data from here, add it to another list
            self.dataOlv.RemoveObjects(selected)
        else:
            # Default, do nothing
            print("Nothing, nothing, nothing at all")
        

class AssetFrame(wx.Frame):
    def __init__(self):
        wx.Frame.__init__(self, parent=None, id=wx.ID_ANY, 
                          title="ObjectListView Demo", size=(800, 600))
        
        self.panel = scrolled.ScrolledPanel(self, id=wx.ID_ANY)

        self.mainSizer = wx.BoxSizer(wx.VERTICAL)
        self.panel.SetSizer(self.mainSizer)
        self.panel.SetupScrolling()
        self.Show()

    def populate(self, config):
        for unit in config["units"]:
            unit_panel = UnitPanel(self.panel, unitName=unit["name"])
            unit_panel.populate(unit["agents"])
            self.mainSizer.Add(unit_panel)

 
if __name__ == "__main__":
    app = wx.App(False)

    # We're going to need to be able to populate the frame from the config,
    # so represent the data as a data structure and initialze with that
    config = {"units": [{"name": "Unimatrix 01",
                        "agents": [
                            {"ident": "7 of 9", "range": 10, "speed": 10, "capacity": 12},
                            {"ident": "Third of 5", "range": 10, "speed": 10, "capacity": 12}
                        ]},
                        {"name": "Unit 2",
                        "agents": [
                            {"ident": "u2s1", "range": 10, "speed": 10, "capacity": 12},
                            {"ident": "u2a2", "range": 10, "speed": 10, "capacity": 12},
                            {"ident": "u2a3", "range": 10, "speed": 10, "capacity": 12}
                        ]},
                        {"name": "Unit Foo",
                        "agents": [
                            {"ident": "bar", "range": 10, "speed": 10, "capacity": 12},
                            {"ident": "baz", "range": 10, "speed": 10, "capacity": 12},
                            {"ident": "quux", "range": 10, "speed": 10, "capacity": 12}
                        ]},
                        {"name": "Enterprise",
                        "agents": [
                            {"ident": "Galileo", "range": 10, "speed": 10, "capacity": 12},
                            {"ident": "Copernicus", "range": 10, "speed": 10, "capacity": 12}
                        ]},
                        {"name": "GSV Insufficent Gravitas",
                        "agents": [
                            {"ident": "GCU Nervous Energy", "range": 1000000, "speed": 3452334, "capacity": 13452342},
                            {"ident": "GCU Grey Area", "range": 1000000, "speed": 234523546, "capacity": 234562312}
                        ]}] 
              }

    frame = AssetFrame()
    frame.populate(config)
    app.MainLoop()

I called the unpickled data “cukes” because this is demo code, and I was being silly. A cucumber is, after all, what you get when you undo the process of making pickles. You may want to change that if you copy/paste this into production code.

A Flat Earth Would Be Odd

I’m not sure what “flat earthers” actually believe, but assuming the world is literally flat leads to some interesting results. For the sake of keeping things simple, lets assume that the world is flat like a coin: locally bumpy, but overall shaped like a disk. This assumption is based on what the word “flat” means. A saddle isn’t flat, so a hyperbolic-curved earth doesn’t count as “flat” in any reasonable way. Further, lets assume that it keeps the same surface area that the shall we say “conventional” model claims that it has. This assumption lets us avoid the absurdities [1] required to preserve distances, and so travel times, while unwrapping a sphere to cover a disk.

These assumptions tell us how big the disk is. The earth’s surface area is allegedly 196.9 million square miles. A disk’s surface area is given by pi * r2, so do the math to get r and you wind up with a radius of 7.916 million miles. This may present astronavigation problems, as the moon is only 238,900 miles away, so it might hit the disk… if the moon were real! [2]

Ok, so we all live on a disk that’s about 16 million miles across. Because I’m comically egotistical, I’m going to say that my home town is in the middle of the disk, equidistant from every edge.

Now we come to an interesting point: How thick is the disk? Let’s assume that we believe in gravity. The gravity in my home town is one G, and it causes objects to fall down, which is to say “towards the surface of the earth”. Under the conventional model, this is because the earth is under me, and so the gravity caused by the large mass of the earth exerts a force on objects above it. Now there are two ways we can go: either the earth is made of the stuff that it is observed to be made of, and has the density it is generally observed to have, or it’s made out of something far more dense. All that this really varies is how thick the disk is under my home town. To have 1G there, using the conventional materials, requires at least the alleged thickness of the round earth, which is to say about 7,917 miles. Using something denser makes it thinner, without affecting the gravity, but there are limits on how dense matter can get.

Ok, so far, so good. However, we’ve set a little bit of a trap for ourselves here. Everywhere you go on the surface of the earth, the gravity is about 1G, so everywhere you go, the disk earth has to be about 8k miles thick. In my home town, at the center of the disk, this isn’t a problem, because the gravity in all the other directions balances out.

What’s that you say? Yes, gravity in other directions. You see, we’re talking about a disk that measures about 8k miles thick and 16M miles across. If you’re in the center, there are equal amounts of disk around you in all directions, so the pull in all the other directions balances out. If you’re off center, there’s more mass on one side of you than on the other, and so there is a component to the gravitational pull that isn’t straight down, and is unbalanced by the ratio of how much mass is on each side.

Initially, this would probably be pretty subtle. Things would fall a little bit in the direction of my home town, but more or less straight down. Friction would suffice to keep things on surfaces, but round things would always roll towards my home town. It would only be a little harder to walk away from my home town than towards it. I doubt it would affect which way water goes down the toilet all that badly, although the water would pile up on one side of the bowl. However, as you got towards the edge, things would get FUCKING DIRE.

How dire? Well, lets look at the volume of that disk earth, which is properly a cylinder now that we know how thick it is. A cylinder that’s 16,000,000 miles across and 8,000 miles thick has a volume of 6,430,000,000,000,000,000 cubic miles. The conventional model earth has a volume of 260,000,000,000 (that is to say, 260 billion) cubic miles, and exerts a gravitational pull of 1G when you’re on the surface, which is to say that all of it is under you. When you’re on the edge of the disk earth, which is to say that almost all of it is, say, east of you, it exerts a gravitational pull of (around) 24,730,769.2308G. So to understand the force exerted on some poor schmuck who happens to get teleported from my town to the literal eastern edge of the world, assume that the ISO standard schmuck weighs 150 lbs. He will suddenly experience a thrust of about four billion pounds of force to the west. For comparison, a Saturn V rocket generated about 7.5 million (with an “m”) pounds of thrust, or about 500 times less thrust than people living on the edge experience as a consequence of just being there.

Unfortunately, since some of that force (at least 1G of it, thanks to the thickness of the disk) is downwards, towards the center of mass of the disk, rather than the location of my home town, the schmuck is going to hit the ground going absurdly fast and get spread all over it.

But wait, what is that ground made of? We said earlier that this disk is made out of normal earth stuff, which you can go out and observe to be mostly silicate-based rocks. The rocks under my home town have about 8 million miles of rock around them, pressing in towards the center of mass of the disk. Extremely weird stuff happens to matter under those kinds of pressures. Hydrogen (theoretically) becomes a metal. Atomic nuclei get mashed into each other.

That said, my home town is going to have other problems. For example, all the water in the world, and all the air, and all the stuff that’s far enough away to experience mostly-sideways gravity, is all going to flow towards the center of mass of the disk, and some of it will be coming in very fast. Since my town is barely above sea level under the conventional model, I think it’s going to get both extremely hot, due to the abrupt change in pressure, and rather wet, although possibly not before the disaster of degenerate matter that’s forming under it gets to the surface.

Alright flat-earthers, you got my home town into this, you get it out. Why are none of these effects observed on the ostensibly flat earth that we live on?

Well, maybe it’s not 8,000 miles thick. Maybe, it is in fact quite thin, and They can manipulate gravity to provide ~1G everywhere you go. They put some thickness everywhere that anyone decides to dig, or anywhere a tree falls over, but everywhere else it’s…. 1 inch thick. One inch is pretty thin, but that still puts the volume of the disk around 99,804,672,000,000,000 cubic miles, and so the inwards G-force experienced by someone on the edge at around 383,864.123G. This is still a troubling amount of force, but clearly if They can provide 1G over the whole surface of the earth, They can sort this out too.

That said, how long have They been doing this? If the earth has always been a disk, then obviously They were doing it before we evolved, so they’re not human. If They did it recently, why did no one notice the change? Was that the “road work” that was making my commute to work slow this morning? Either way, this moves “Them” from the category of “Federal/NWO government conspiracy to hide the truth, man!” to “Capricious god or gods with odd senses of humor”. At that point, there’s no use arguing what shape the world is, because it might be different tomorrow.


[1] These absurdities extend from the simple “everyone who has ever traveled is part of the conspiracy and lies about how long it takes” to the complex “THEY (It’s always ‘they’, innit?) can alter spacetime to slow or speed up travel”.

[2] Spoiler alert: it is.

Let’s make a 3D rendering of a forest!

Let’s do it with a video shot from a GoPro by someone walking around a camp site!

I’m not actually sure this is all that easy, or even possible. The main way to do this sort of rendering from a monocular camera is structure from motion (SfM), which relies on SIFT features in the image, and from what I recall of SIFT features, I’m not 100% convinced that they’re going to track very well from frame to frame, because the frames are mostly full of trees, and leaves look a lot like each other. On the other hand, there’s plenty of texture in the image, so maybe it will work just fine.

I’m planning to use Visual SfM, for which there are instructions online. Those instructions are for Ubuntu 12.04, but I’m on 16.04, so there are some modifications required. I got and ran the latest CUDA installer, using the local install rather than the network one because NVidia broke the network one somehow. I suspect a bad URL in the step where you install the keys.

I got what I think are the dependencies with the command

sudo apt-get install libgtk2.0-dev glew-utils libdevil-dev libboost-all-dev libatlas-cpp-0.6-dev libatlas-dev imagemagick libatlas

The build for VSfM was so quick I thought something had gone wrong, but it did build.

The SiftGPU link in the instructions is dead, but this looks legit. The make process threw some warnings at me, but no errors.

Rather than copying the library once it’s built, I linked it with ln -s ../../SiftGPU/bin/libsiftgpu.so ./libsiftgpu.so, we’ll see if this comes back to haunt me later.

I got PBA from the suggested place, and made it without adding the include for stdlib.h, since it’s 1.05 rather than 1.04 now, and perhaps they fixed that. It seems good, as it built (although with some warnings).

I went ahead and applied the mylapack hack to pvms-2, it seems to have gone well.

cd pmvs-2/program/main/
mv mylapack.o mylapack.o.bak
make clean
mv mylapack.o.bak mylapack.o
make depend
make

Graclus 1.2 built just fine (again with the warnings, though). Read the readme, it has how to set the number of bits used, and if you’re not on a laughably ancient machine, the value you want is 64.

I installed the cmvs from here, you want the one called cmvs-fix2.tar.gz, because non-fix versions seem to have issues with finding lapack.

After that I had to reboot, because everything that built was linked against CUDA and video drivers that I wasn’t running, and so it all crashed when it tried to render anything in X. After rebooting, everything came up fine and I can run VisualSFM, so the next part of this will be getting the video out of the GoPro and feeding it to VisualSFM.

Configuring video cropping and resizing with ROS’s image_proc

The existing documentation on this seemed a little sparse, so I figured I’d post up an example based on what I figured out for my project.

I have a 1024×768 sensor_msgs/Image coming out of image rectification on the topic /overhead_cam/image_rect, with camera info on the predictably-named topic /overhead_cam/camera_info. What I want is to crop off the top 120 px of that image so that it’s the right aspect ratio to get scaled to 1680×1050 without distortion (technically, scaling is a distortion, but it’s the only distortion I want). Then I want to scale it up to 1680×1050.

<!-- Video cropping -->
<node pkg="nodelet" type="nodelet" args="standalone image_proc/crop_decimate" name="crop_img">
  <param name="x_offset" type="int" value="0" />
  <param name="y_offset" type="int" value="120" />
  <param name="width" type="int" value="1024" />
  <param name="height" type="int" value="684" />

  <!-- remap input topics -->
  <remap from="camera/image_raw" to="overhead_cam/image_rect_color"/>
  <remap from="camera/image_info" to="overhead_cam/camera_info"/>

  <!-- remap output topics -->
  <remap from="camera_out/image_raw" to="camera_crop/image_rect_color"/>
  <remap from="camera_out/image_info" to="camera_crop/camera_info"/>
</node>

<!-- Video resizing -->
<node pkg="nodelet" type="nodelet" args="standalone image_proc/resize" name="resize_img">
  <!-- remap input topics -->
  <remap from="image" to="camera_crop/image_rect_color"/>
  <remap from="camera_info" to="camera_crop/camera_info"/>

  <!-- remap output topics -->
  <remap from="resize_image/image" to="camera_resize/image_rect_color"/>
  <remap from="resize_image/camera_info" to="camera_resize/camera_info"/>
</node>

<!-- Dynamic reconfigure the resizing nodelet -->
<node name="$(anon dynparam)" pkg="dynamic_reconfigure" type="dynparam" args="set_from_parameters resize_img">
  <param name="use_scale" type="int" value="0" />
  <param name="width" type="int" value="1680" />
  <param name="height" type="int" value="1050" />
 </node>

The first bit does the cropping, running the decimate nodelet as a standalone node and doing the cropping. It can also decimate (shrink images by deleting rows/cols), but I’m not doing that.

The second bit starts up the video resizing nodelet as standalone, and the third bit fires off a dynamic reconfigure signal to set the image size. If you set use_scale to true, it will scale an image by some percentage, which isn’t quite what I wanted.

I imagine it’s possible to fold these into image rectification as nodelets (image rectification is itself a debayer nodelet and two rectify nodelets, one for the mono image and one for the color image), but I didn’t look into that because this worked for me.

 

In which Ellipses are Dealt with Handily

I’m making a user interface where people can draw a lasso select on an image, and I want to see if certain points in the image are within that lasso select. Now that I think about it, I could probably have done some form of polygon membership test (e.g. raycasting) for each of the points that I wanted to check, with the points of the lasso forming the polygon, but that has a few problems. One of them is that the ends of the lasso can overlap, but that can be handled by e.g. a convex hull. In fact, I may end up doing that, because the current approach has the problem that it can’t handle an arbitrarily-shaped lasso selection.

What I ended up doing as a first cut, however, has some other useful properties. I had the software find a minimum enclosing ellipse of the points for the lasso selection, and then tested the points I wanted checked to see if they were inside the ellipse.

 

def min_bounding_oval(points, tolerance=0.01):
  #Create a numpy 2xN matrix from a list of 2D points
  P = np.matrix(points).T

  # Pseudocode from https:#stackoverflow.com/questions/1768197/bounding-ellipse
  # Pythonificiation by me
  # Input: A 2xN matrix P storing N 2D points 
  #        and tolerance = tolerance for error.
  # Output: The equation of the ellipse in the matrix form, 
  #         i.e. a 2x2 matrix A and a 2x1 vector C representing 
  #         the center of the ellipse.

  # Dimension of the points
  d = 2;   
  # Number of points
  N = len(points);  

  # Add a row of 1s to the 2xN matrix P - so Q is 3xN now.
  Q = np.vstack([P,np.ones((1,N))]) 

  # Initialize
  count = 1;
  err = 1;
  #u is an Nx1 vector where each element is 1/N
  u = (1.0/N) * np.ones((N,1))

  # Khachiyan Algorithm
  while err > tolerance:
    # Matrix multiplication: 
    X = Q * np.diagflat(u) * Q.T

    M = np.diagonal(Q.T * X.I * Q)

    # Find the value and location of the maximum element in the vector M
    maximum = M.max()
    j = np.argmax(M);

    # Calculate the step size for the ascent
    step_size = (maximum - d -1)/((d+1)*(maximum-1));

    # Calculate the new_u:
    # Take the vector u, and multiply all the elements in it by (1-step_size)
    new_u = (1 - step_size) * u ;

    # Increment the jth element of new_u by step_size
    new_u[j] = new_u[j] + step_size;

    # Store the error by taking finding the square root of the SSD 
    # between new_u and u
    err = math.sqrt(((new_u - u)**2).sum());
    
    # Increment count and replace u
    count = count + 1;
    u = new_u;

  # Put the elements of the vector u into the diagonal of a matrix
  # U with the rest of the elements as 0
  U = np.diagflat(u);

  # Compute the A-matrix
  A = (1.0/d) * (P * U * P.T - (P * u)*(P*u).T ).I

  # And the center,
  c = P * u

  return [A, c]

That gets me A, a 2×2 matrix representing the ellipse and c, a 2×1 matrix representing the center of the ellipse.

To check if a point is in the ellipse, some fiddling has to be done to A.

  
#Convert A to a whitening matrix (W = A**-1/2)
w, v = np.linalg.eig(A)
D = np.diagflat(w)
W = v * np.sqrt(D) * v.I

#Subtract the center of the ellipse and use the whitening matrix
#tag_x and tag_y are the x and y positions of the point to check
p = np.matrix([[tag_x],[tag_y]])
p_center = p - c
p_white = W * p_center

#Check if the whitened point is in the ellipse
if np.linalg.norm(p_white) <= 1:
     return True #or whatever you want to do with it

Note that I haven’t proven this code correct or anything, just run it and had it work for me.

Chatbots as Translation

I got a translation program based on deep neural networks (http://opennmt.net/ if anyone wants to play along at home). I’m training it on a corpus of my previous text chats. The “source language” is everything that everyone has said to me, and the “target language” is what I said in return. The resulting model should end up translating from things that someone says to appropriate responses. My endgame is to hook it up to an instant messaging client, so people can chat with a bot that poses as me.

This has a couple of problems. The main one is that statistical translation generally works by converting the input language into some abstract form that represents a meaning (I’m handwaving so hard here that I’m about to fly away) and then re-representing that meaning in the output language. Essentially, the overall concept is that there is some mathematical function that converts a string in the input language to a string in the output language while preserving meaning, and that function is what is learned by the neural network. Since what someone says and what I respond with have different, but related meanings, this isn’t really a translation problem.

The other problem comes up when I do the second part of this plan, which is to train the network with a question and answer corpus. At its most abstract, a question is a request to provide knowledge which is absent, and an answer is an expression of the desired knowledge. “Knowledge”, in this case, refers to factual data. One could attempt an argument that by training on a Q&A corpus, the network is encoding the knowledge within itself, as the weights used in the transformation function. As a result, the network “knows” at least the things that it has been trained on. This “knowing” is very different from the subjective experience of knowing that humans have, but given the possibility that consciousness and subjective experience may very well be an epiphenomenon, maybe it has some similarities.

Unfortunately, this starts to fall apart when the deep network attempts to generalize. Generalization, in this case, is producing outputs in response to inputs that are not part of the training input data set. If one trains a neural network for a simple temperature control task, where the input is a temperature, and the output is how far to open a coolant valve, the training data might look like this:

Temperature Valve Position
0 0 (totally closed)
10 0.1
20 0.2
30 0.3
40 0.4
50 0.5 (half open)
60 0.6
70 0.7
80 0.8
90 0.9
100 1.0 (fully open)

So far, so good. This is a pretty simple function to learn, the valve position is 0.01 * Temperature. The generalization comes in when the system is presented with a temperature that isn’t in the training data set, like 43.67 degrees, which one would hope results in a valve setting of 0.4367 or thereabouts. There is a problem that temperatures less than zero or greater than 100 degrees result in asking the valve to be more than completely shut, or more than fully open, but we can just assume that the valve has end stops and just doesn’t do that, rather than trying to automatically add a second valve and open that too.

The problem comes when we start generalizing across questions and answers. Assume there is some question in the training data that asks “My husband did this or that awful thing, should I leave him?” and the answer is mostly along the lines of “Yes, bail on that loser!”, and another question that asks “My husband did some annoying but not really awful thing, should I leave him?” and the answer is “No, concentrate on the good in your relationship and talk to him to work through it.” These are reasonable things to ask, and reasonable responses. Now imagine that there is a new question. The deep network does its input mapping to the space of questions, and the result (handwaved down to a single value for explanation purposes) falls somewhere between the representations for the “awful thing” question and the “annoying thing” question. Clearly, the result should fall somewhere between “DTMFA” and “Stick together”, but “Hang out near him” isn’t really good advice and “Split custody of the kids and pets, but keep living together” seems like bizzaro-world nonsense. There isn’t really a mathematical mapping for the midrange here. Humans have knowledge about how human relationships work, and models of how people act in them, that we use to reason about relationships and offer advice. This kind of knowing is not something deep networks do (and it’s not even something that anyone is trying to claim that they do), so I expect that there will be a lot of hilarious failures in this range.

Ultimately, this is what I’m hoping for. I’m doing this for the entertainment value of having something that offers answers to questions, but doesn’t really have any idea what you’re asking for or about, and so comes up with sequences of words that seem statistically related to it. We (humans) ascribe meaning to words. The deep network doesn’t. It performs math on representations of sequences of bytes. That the sequences have meaning to us doesn’t even enter into the calculations. As a result, its output has flaws that our knowledge allows us to perceive and reflect on.

Plus, I’m planning to get my Q&A corpus from Yahoo Answers, so not only will the results be indicative of a lack of knowing (in the human sense), they’ll also be amazingly low quality and poorly spelled.

 

TF-IDF in Python

I am trying to process a bunch of text generated by human users to figure out what they are talking about, in the context of an experiment where there were robots a person could be controlling, a few target areas the robots could move to, and a few things they could move, like a crate. One thing that occurred to me was to use TF-IDF, which, given a text and a collection of texts, tells you what words in the text are relatively unusual to that particular text, compared to the rest of the collection.

It turns out that that’s not really what I wanted, because the words that are “unusual” are not really the ones that the particular text is about.

select(1.836) all(3.628) red(2.529) robots(1.517) ((1.444) small(5.014) drag(2.375) near(3.915) lhs(4.321) of(1.276) screen(2.018) )(1.444)

This is a sentence from the collection, and the value after each word is the TF-IDF score for that word. The things I care about are that it’s about robots, and maybe a little that it’s about the left hand side of the screen. “Robots” actually got a pretty low score (barely more than “of”), but “small” got the highest score in the sentence.

At any rate, this is how I did my TF-IDF calculation. Add documents to the object with add_text, get scores with get_tfidf. It uses NLTK for tokenization, you could also use t.split(” “) to break strings up on spaces.

class TFIDF(object):
  def __init__(self):
    #Count of docs containing a word
    self.doc_counts = {}
    self.docs = 0.0

  def add_text(self, t):
    #We're adding a new doc
    self.docs += 1.0
    #Get all the unique words in this text
    uniques = list(set(nltk.word_tokenize(t)))
    for u in uniques:
      if u in self.doc_counts.keys():
        self.doc_counts[u] += 1
      else:
        self.doc_counts[u] = 1

  def get_tfidif(self, t):
    word_counts = {}
    #Count occurances of each word in this text
    words = nltk.word_tokenize(t)
    for w in words:
      if w in word_counts.keys():
        word_counts[w] += 1
      else:
        word_counts[w] = 1
    #Calculate the TF-IDF for each word
    tfidfs = []
    for w in words:
      #Word count is either 0 (It's in no docs), or the count
      w_docs = 0
      if w in self.doc_counts.keys():
        w_docs = self.doc_counts[w]

      #the 1 is to avoid div/zero for previously unseen words
      idf = math.log(self.docs/(1+w_docs))
      tf = word_counts[w]
      tfidfs.append((w, tf * idf))
    return tfidfs

 

IoTiocy

I may be the first to come up with this terrible neologism, but the time for it is certainly at hand. There have been a number of high-profile incidents where the security of stuff that was connected to the Internet was neglected in the rush to get it connected, or highly dubious design decisions were made in order to get an IoT “thing” to market.

The term originally came up when I was discussing this widget with a friend of mine. The NodeMCU boards are pretty sweet, and are based on the ESP-8266, which is also pretty sweet. The motor driver chip they used is from the 1970s, and there are far better ones available. The VNH2SP30 is a beast, and can supply 10x the current of the L293D, although you’d need two of them for dual motors, so there may be a horses for courses argument to be made there.

The DDoS that took out Krebs On Security, though, doesn’t really have a similar argument going for it. IoT security is hard because the devices don’t have a great interface for users to tighten up their security settings (if indeed, they have any security settings), and users don’t expect to have to tighten up the security of their appliances. As a result, insecure IoT stuff just kind of hangs out in dubious parts of the web, waiting for someone to make it a questionable offer of employment.

To the people who make things incomplete, weak, or insecure, I dedicate this new word:

IoTiot noun, informal

  1. A person or company that creates IoT devices lacking security, solid design, or even a purpose, usually in order to make a quick grab for cash.
  2. Any such device, once it starts behaving as it was designed (which is to say “badly”).

Similarly: IoTiotic, IoTiots, IoTiocy

 

The ancient Roman computer

I was recently assembling a slide deck on assistive technology and the sort of gradient that exists between assistive devices and human enhancement. We already have the technology to build exoskeletons that can give their user increased strength, or allow them to work longer with less fatigue (albeit only in specific types of tasks). These are physical enhancements, the same way that e.g. reading glasses on a person with normal vision act as magnifiers, and give them better close-in vision. The existence of physical enhancements begs the question of whether we can also have cognitive enhancements.

Donald Norman describes the use of Arabic numerals as a “cognitive artifact”, a tool that we use because it makes mathematical operations easier. One alternative in Western civilization is the Roman numeral system, which uses letters to represent values. The Roman numerals are I (1),  V (5),  X(10), L(50), C (100), D (500), and M (1000). So far so good, but Roman numerals are not a place-based system. Instead, the value of a number is a summation based on rules:

  1. Read from left to right, summing up values.
  2. Iif the number you are looking at is bigger than the number to its right, add it to the total.
  3. If the number you are looking at is smaller than the number to its right, subtract the number you’re looking at from the one to its right and add that.

So VI is 6 (5 + 1), but IV is 4 (5 -1), and VC is 95. The current year is MMXVII (1000 + 1000 + 10 + 5 + 1 + 1), which is OK to decode, but 1998 was MCMXCVIII or 1000 + (1000 – 100) + (100 – 10) + 5 + 1 + 1 + 1.  Looking at the Arabic expansion, the -100 and +100 cancel out, so it could be reduced to 1000 + 1000 – 10 + 5 + 1 + 1 + 1, so MXMVIII. There’s a trick to make addition of Roman numerals easier, which Norman describes, but doing multiplication is something of a nightmare.

I recently got to thinking about writing a Roman numeral library for a programming language, so that you could put this sort of thing in a program. This has no doubt been done already, so I’m not going to redo it, but my first thought was that I’d just write a translation from Roman to Arabic numerals (which pretty much all programming languages use already) and back, and then I’d do the math in Arabic numerals. That’s not a Roman numeral library, though. That’s an Arabic library with a Roman presentation layer.

Unfortunately, on most computer architectures, there’s no other way to do it. The binary ALU in processors is a place-value based system. It’s essentially Arabic numerals for people with one finger, where you carry to the next higher place as soon as you have more than one of something. An implementation of Roman numerals on pretty much any CPU will be turned into an Arabic-like representation before it gets operated on.

This got me thinking about what it would be like to attempt to build an ALU, or a simple handheld four-function calculator, that operated in Roman numerals natively. That is to say, the voltages in the various “bits” would represent I, V, X, L, C, D, and M, instead of a place-based binary representation. This immediately runs into a LOT of problems. For starters, the cool thing about binary is that you operate the transistors in switching mode, either on or off, rather than linear mode. Current either flows with minimal resistance, or doesn’t flow. In linear mode, to get multiple voltages, the transistors act as resistors, dissipating some of the power as heat. After that, you have to deal with the context-based values, rather than place-based values. Imagine that we have an adder, with two inputs, A and B, and we feed it A = I, B = V. Clearly, the output has to be, uh… well, it has to be 4, but there isn’t a numeral for that. There’s no single-digit Roman numeral that is 4. But lets say that we don’t care about not having a representation, and that this is just an analog computer. And so that our heads don’t explode trying to keep things straight, lets say that the voltage for ‘4’ is a little less than the voltage for V, but 4 times the voltage for I. This is nice because the voltage for IIII is the same as the voltage for IV. So far so good, but now we give our adder A = V, B = I, and it has to put out ‘6’, which we also don’t have a representation for. We’re clever, it’s analog, the voltage that represents ‘6’ is the voltage for ‘V’ plus the voltage for ‘I’. So if A > B, the output is A + B, if A < B, the output is A-B, and if A = B, the output is A + B (VV is 10, I suppose, but so is X). This is all doable, I’m sure, probably with op-amps, some of which would be acting as comparators, but it’s complex. More complexity is more transistors and components, and so more waste heat and chances to get the wiring wrong.

Annoyingly, that’s not the end of our problems. In the last couple of paragraphs, there are some ambiguous representations of numbers. VV is 10, but so is X. In Arabic numerals, 9 is 9, and 9 is the only digit that’s 9. Arabic also has the nice property that you can guess about how big the representation of the result is, based on the size of the operands. There’s no way to add a 4 digit number and a four digit number (assuming they’re both positive), and get a 2 digit number, or a 12 digit number. Adding II to MXMVIII gets you XX. Multiply two Arabic numbers, and the result will be about the length of the two numbers stuck end to end, +/-1. Division will be the difference in length of the numbers (I’m handwaving here and ignoring fractions). Multiply two Roman numerals, and your guess is as good as mine (As Norman points out, the written length of an Arabic numeral is also a rough gauge of it’s value. 1999 (length 4) is bigger than 322 (length 3), but MXMVIII (length 7) is less than MM (length 2)). There may be some regularities, but I’m not about to sit down and work them out. If anyone does, the bound on the Roman computer word size would be whatever number is longest and less than whatever size of MACHINE_MAX_INT you feel like implementing in the silicon space and power-hungry representation that the machine uses. As a result, a natively Roman calculator would potentially have very long “words” or “bytes”, much of which would go unused most of the time.

Roman numerals also didn’t have a standard representation for fractions, although they apparently sometimes used a duodecimal system for them (yes, two different number systems, one for integers and one for fractions). Roman numerals also had no formal representation at all for zero, although the Romans clearly understood the idea of zero (You have ten dinars, and you owe me ten dinars. I collect your debt, how many tunics can you buy now?). They used “nulla” or “nihil” sometimes, and represented it with an N, but they wouldn’t have written 500 as VNN.  They also didn’t represent negative numbers at all, although there were probably cases where they understood a subtraction to result in something that wasn’t even zero (You have 10 dinars and owe me 15 dinars. I collect your debt. Clearly you don’t have -5 dinars, but I also clearly don’t consider myself fully repaid).

I’ve heard that the Roman conception of math and numbers was highly geometric, rather than arithmetical, so they did have things like the square root of 2, which is the length of the diagonal of a unit square (you know, a square with sides of length I). Pythagoras figured that one out, at roughly the same time people were using “nulla”. Instead of running on abstractions of the representations of numbers, a properly Native Roman Computer might have been a mechanical device, a calculator that operated by using finely crafted physical representations of circles and triangles to perform its operations. Interestingly, the closest thing we have to a calculator from that time period is exactly that.