The Surveying Problem Part 1: The Problem

During the technical component of an interview for a software developer position I was posed the following problem:

You are working for a company that is responsible for processing survey data for an Oil and Gas company. You are given data that summarizes whether oil was found at a certain grid reference. The assumption is that if oil is found in contiguous grid references (or grid elements), then they form a single oil reservoir. The objective is to write a program that will determine the number and locations of the reservoirs.

The Algorithm

The interview went pretty well, and I managed to write something within 10-15 minutes or so that I think did the job. The coding exercise was done with a shared Google Doc, so I never actually got the chance to run the code, but my interviews seemed happy with the solution. The algorithm I came up with looked something like this:

  1. Create a new active reservoir
  2. For each element in grid:
    1. If the element is in the checked_locations set, go to the next element
    2. Add the element to checked_locations
    1. Add the element to the currently active reservoir
    2. Check if there are any neighbors to this element
      1. If yes: go to 2 (passing in the list of neighbors as the grid)
    1. If we are not currently recursing:
      1. Add the reservoir to the result set
      2. Create a new active reservoir

The Discussion

When I was asked to defend my code, I said I was initially hesitant about going down a recursive route, because Python is generally not a great language for writing functional-style recursive functions. However, a purely iterative approach would mean having to run an arbitrary number of times to see if there were any adjacent reservoirs that could be joined, which also isn’t ideal.

As I said, my interviewers seemed happy with my solution, so I assume this was the solution they had in mind. However, I really wasn’t satisfied with the recursive approach, and I suggested a graph-based approach instead. The crux of the problem is taking a grid of data and understanding the connectivity, which is what graphs are intended to capture. I have a bit of experience with the iGraph and NetworkX Python libraries and knew that a graph-based solution would need probably half as many lines and probably be as performant as the recursive approach, if not more so.

The discussion went along the lines of trading one form of complexity for another. I argued on the side of trying to shift the complexity into the standard library, even potentially at the expense of performance. Yes, a full graph library is maybe overkill for a relatively simple problem. However, I’m always in favor of writing (and maintaining!) as little code as possible.

Outcome

I think the discussion was well-received, but alas I didn’t end up getting the job (probably more to do with COVID-19 than anything else). As a coding exercise, I decided to revisit the code to see firstly if my algorithm above would really work, and how much more/less performant a graph-based approach would be. Expect articles in the near future that show the code I have written for each approach and an analysis of the pros and cons of each!