Version 0.2 Nov. 2007

Below is a sketch of a "simple," but not state-of-the-art algorithm for this problem:

Maintain an undirected graph with three operations:
     addArc( node1, node2 ) :Arc
     deleteArc( arc )
     areConnected( node1, node2 ) :Boolean
Here "connected" means directly or indirectly. The algorithm should be "on-line," "incremental", or "dynamic," i.e. all three operations should be reasonably fast when one is doing arbitrary sequences of adds, and deletes and tests.

The state of the art around 1990 was O( sqrt n ); now it seems to be O( (log n)^2 ). These links will get you to a relevant paper:

"Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity". Jacob Holm, Kristian de Lichtenberg, Mikkel Thorup.
STOC '98, Journal of the ACM (JACM) July 2001.

The idea sketched here is probably O( log(n) * sqrt(n) ) for the hardest operation (delete).

Given the complexity of my solution and the others I've seen, though, I guess I would go for something specialized if I had a small or well-behaved application that needed it.

For instance, if you never delete links, there's a nice, simple solution that is guaranteed O( log n ) and amortized better than that. It's the deletes that make the problem hard.

This method is "simple" if you already understand 1 and 2 dimensional enfilades (they are balanced trees that allow easy rearrangement, among other things). Here's a tutorial on enfilades starting with the 1D case:

Here's a page on a 2D enfilade that's very similar to the one used here:
The section "How it Does it" describes the basics of 2D enfilades, assuming you're familiar with the 1D case.

The Ginsu Scrambled Eggs Algorithm

Definitions: a "component" is a set of nodes that are all connected to each other directly or indirectly by arcs between nodes in the set. A "subcomponent" is a subset of the nodes in a component, that forms a component by itself.

1) Keep all the nodes in a balanced, hierarchical component-subcomponent tree. (Since there can be separate components, this is a forest stuffed into a tree-shaped container.) The arrangement of the tree allows an integer address to be assigned to each node, and a contiguous range of addresses to be assigned to each component and subcomponent. Maintaining the tree with those properties involves rearranging it, which rearranges the address space.

2) Keep the arcs as bits in a sparse square matrix, addressed on both sides by the address space created by the component tree. This matrix is symmetrical across the diagonal, but that's not a big problem, so let's ignore that issue altogether.

Every time the component tree is rearranged, the arc matrix is sliced along both dimensions and rearranged. For example, if we swap regions b and c (this is the Ginsu part):

   aa | ab | ac | ad           \        aa | ac | ab | ad
  ----+----+----+----         | \      ----+----+----+----
   ba | bb | bc | bd    ------   \      ca | cc | cb | cd
  ----+----+----+----   ______   /     ----+----+----+----
   ca | cb | cc | cd          | /       ba | bc | bb | bd
  ----+----+----+----          /       ----+----+----+----
   da | db | dc | dd                    da | dc | db | dd

Now, the arcs between any two subcomponents correspond to the bits in a rectangular region of the matrix.

areConnected( node1, node2 ) amounts to finding out which component each of the two nodes lives in. Iff it's the same, they're connected. Answering this question requires walking up a balanced tree, O( log n ).

Adding an arc requires modifying and maintaining the data structures in a way that's "fairly obvious" if you understand enfilades and the problem. It may cause two components to become connected, or two sub-subcomponents to become connected within the same subcomponent, which requires moving one address block next to another, rearranging the matrix once, which is probably O( sqrt n ).

Deleting an arc can cause a subcomponent to become disconnected from its tree-neighbors, and then there is a need to find out what else, if anything, it's still connected to. This amounts to a search in a rectangular region of the matrix, which is probably O( sqrt n ). But the disconnection may propogate upward in the tree, so the total cost of an arc deletion is probably O( (log n) * (sqrt n) ).

Steve Witham <sw at remove this tiac dot net>