Curious Effects Getting List Extents

Posted in Uncategorized - May 16, 2019 - 0 Comment

I have a program that gets a list of GPS waypoints, and wants to figure out their bounding box. The naive way[1] to do this is find the maximum and minimum latitude and longitude, and use the maxes as one corner and the minimums as the other corner.

Off the top of my head, I can think of two ways to do this: Iterate the list of waypoints, comparing to the max and minimum so far, and updating as I go. The list has N points, I have to look at all of them, so O(N), so far so good.

The other way to do it is to do list comprehensions to get the latitudes and longitudes as separate lists, and then call max() and min() on each of those. I would assume that each list comprehension is O(N), and each call to max() or min() is also O(N), since they have to look through the whole list to find the maximum or minimum, and so it is 6 loops over the list (2 comprehensions, 2 max() calls, 2 min() calls), and so this is the slower way to do it.

It turns out, not so much.

I ran the code below on Repl.it and got, usually, the list comprehension version being just very slightly faster to twice as fast. Occasionally, the 10,000 instance case is slower, but not all the time.

import random 
from timeit import default_timer as timer

#Try some different sizes of lists
for jj in [10, 100, 1000, 10000, 100000, 1000000]:

  #Set up N waypoints
  waypoints = []
  for ii in range(jj):
    lat = (random.random() * 360) - 180
    lon = (random.random() * 360) - 180
    waypoints.append({"lat":lat, "lon":lon})

  start = timer()

  # One loop
  maxLat = maxLon = -float("inf")
  minLat = minLon = float("inf")
  for point in waypoints:
      lat = float(point["lat"])
      if lat < minLat:
          minLat = lat
      if lat > maxLat:
          maxLat = lat
      lon = float(point["lon"])
      if lon < minLon:
          minLon = lon
      if lon > maxLon:
          maxLon = lon

  mid = timer()

  # List comprehensions
  lons = [float(point["lon"]) for point in waypoints]
  lats = [float(point["lat"]) for point in waypoints]
  minLat1 = min(lats)
  minLon1 = min(lons)
  maxLat1 = max(lats)
  maxLon1 = max(lons)

  end = timer()

  #Print the results
  print(f"{jj} points")
  print(f"  first way {mid-start}")
  print(f"  second way {end-mid}")
  print(f"  speedup {(mid-start)/(end-mid)}")
  assert(minLat == minLat1)
  assert(maxLat == maxLat1)
  assert(minLon == minLon1)
  assert(maxLon == maxLon1)

So why is it faster? Clearly, I’m assuming something wrong. I suspect the main thing that I’m assuming wrong is that the constant 6 multiplied by the O(N) matters. It probably doesn’t, and that’s why we typically drop the constant multipliers in runtime comparisons. It’s likely that list comprehensions and max()/min() of iterables are calls to a very fast C implementation, and are just so much faster than my loop in Python that the fact that I’m doing 6 iterations doesn’t really matter.

Another thing that I’m assuming is that max and min are implemented as linear searches over iterables. It’s entirely possible that iterables store references to their maximum and minimum values, and just return that when asked, rather than going looking. I doubt it, since the overhead on removing an element would be large [2], but it is possible.

I haven’t looked into either of these assumptions, since timing the runs answered the question I had (“Which is faster?”), and the follow-on question (“Why?”) isn’t useful to me at this time.

[1] It does some stupid stuff around the poles and across the international date line, for example.

[2] You’ve removed the largest element. What is the new largest? Time to go searching…

Alternatively, the list could be implemented as a parallely-linked-list, where one set of links is the elements in their list order, and the other set is the elements in their sorted order, but then the list [1, 3, 5, “pickles”, <built-in method foo of Bar object at 0x1f34b881>, 6] doesn’t have well-defined links for the sorted order.

Gizmosmith