This post is about computing the nearest common ancestor. It is the result of a month or so of reading papers and implementation. Although there has been a lot of research into the problem, there are no implementations online of the two main algorithms presented. Worse, examples in the papers are practically useless. For example, the Alstrup-Gavoille-Kaplan-Rauhe (AGKR) algorithm was first described in a technical report in August 2001, but it did not contain an example. In 2002, it was presented at a conference but again it did not contain an example. Finally, in 2004, it was published in a peer-reviewed journal and contained an example, albeit still incomplete.
Nearest common ancestor
Imagine meeting another person who has the same last name as you and suggests the possibility that you are related. You and your acquaintance create your own family trees and ask whether you have any common ancestors. If you are related, which relative is the most recent common ancestor? This is the problem of the nearest common ancestor (NCA), which is also known as the least common ancestor, or lowest common ancestor, first presented by Aho et al. [1].
The nearest common ancestor problem begins with a rooted general tree, G = (V, E, r ∈ V), and two nodes in the tree. The set of ancestors of node is the set of nodes in the parent chain that starts at the node and ends at the root. The set of ancestors a(x) of a node x ∈ V is
a(x) = (x = r) ? {x} : {x} ∪ a(Parent(x))
The proper ancestors of a node is the set of nodes in the parent chain that starts at the parent of the node and ends at the root, i.e., the set of ancestors for the node but not including the node itself. The set of proper ancestors p(x) of a node x ∈V is
p(x) = (x = r) ? ∅ : {Parent(x)} ∪ p(Parent(x))
The set of common ancestors ca(x, y) of node x, y ∈V is
ca(x, y) = a(x) ∩ a(y)
The nearest common ancestor nca(x, y) = w if w is a common ancestor of x and y with maximal depth d(w). So, if ca(x, y) is the set of nodes v1, v2, v3, …, vk, then vk is the least common ancestor if d(v1) < d(v2) < d(v3) < … < d(vk).
nca(x, y) = vk where vk ∈ ca(x, y) = { v1, v2, v3, …, vk } and d(vn) < d(vk) for n ≠ k
In Figure 1, the ancestors of e is the set {a, b, e}, and the ancestors of m is the set {a, d, m}. The proper ancestors of e is the set {a, b}, and the proper ancestors of m is the set {a, d}. The common ancestors of nodes e and m is {a}; the nearest common ancestor of nodes e and m is node a.
The nearest common ancestor problem has many different applications: compiler construction [1], string search algorithms [2], and evolutionary trees [3]. It also has applications in graph drawing in the Walker algorithm [4] and planarity testing [5, 6], which we will discuss later on.
Figure 1. A rooted general tree with ancestors of two leafs.
Simple solution for the nearest common ancestor
The simplest solution to the nearest common ancestor problem is computed using two stacks. In a loop, the ancestors for one node are pushed onto one stack. In a second loop, the ancestors for the other node are pushed onto the other stack. Finally, the top entries of each stack are popped and compared. If the top entries are the same, then the ancestors are the same, and we note the ancestor in a temporary variable. However, if the top entries differ, the nearest common ancestor has been found, and it is retained in the temporary variable.
While this algorithm works fine for many applications, the issue can become probative in computational time when the nearest common ancestor is computed for many different node pairings. It computes the nearest common ancestor in O(n) time.
Implementation in C#
1 public Vertex NcaNaive(Vertex x, Vertex y)
2 {
3 Stack<Vertex> s1 = new Stack<Vertex>();
4 Stack<Vertex> s2 = new Stack<Vertex>();
5 Vertex lca = null;
6
7 do
8 {
9 s1.Push(x);
10 x = x.Parent;
11 } while (x != null);
12 do
13 {
14 s2.Push(y);
15 y = y.Parent;
16 } while (y != null);
17 while (s1.Top() == s2.Top())
18 {
19 lca = s1.Pop();
20 s2.Pop();
21 }
22 return lca;
23 }
Each “do-while” block (lines 7-11 and 12-16) computes a list of ancestors in order from a node to the root of the tree. The “while” block at the end of the method (lines 17-21) compares each list of ancestors, processing each node from the root to the children until there is a difference in the ancestor list. The last node that is equal is the nearest common ancestor, and is the temporary variable lca.
Schieber-Vishkin solution for the nearest common ancestor
The Schieber-Vishkin algorithm is probably the first solution for the nearest common ancestor that is considered efficient, simple, and understandable. It is an extension of an earlier but more complicated solution by Harel and Tarjan [7]. It computes the nearest common ancestor in O(1) time after preprocessing the tree once in O(n) time. Both of these solutions are based on the reduction of the tree into a complete binary tree that has been labeled with an inorder numbering, and is also known as a “sideways heap” [8]. The complete binary tree has several interesting properties which we will now illustrate by example (see Figure 2).
Figure 2. A full binary tree of 31 nodes, inorder numbering.
The first property concerns the node inorder numbering and the depth of the node in the complete binary tree. If the label of a particular node in represented as a binary integer, the level of the node in the tree can be determined by the position of the rightmost 1-bit in the label. In Figure 2, node 19 has the binary representation 10011, and it is a leaf node because the rightmost 1-bit is in the 1’s position. Node 20 has the binary representation 10100, and it is two levels up from the leaves of the tree because the rightmost 1-bit is in the 4’s position. We define the height of a node relative to the leaves of the tree, height(v) as the following function:
height(v) = ⌊ log2(v & –v) ⌋
where & is the bitwise-and binary operator, and ⌊ x ⌋ is the integer floor function.
Using this formula, height(3) = ⌊ log2(00011 & -00011) ⌋ = ⌊ log2(00011 & -11101) ⌋ = ⌊ log2(00001) ⌋ = 0 (i.e., it is a leaf node). The height(20) = ⌊ log2(10100 & -10100) ⌋ = ⌊ log2(10100 & -01100) ⌋ = ⌊ log2(00100) ⌋ = 2 (i.e., two nodes up from a leaf node).
A second property concerns the ancestor of a node and the difference in the height between the node and its ancestor. Given a node v and the difference in the height h, the ancestor is
a(v, h) = ((v >> h) | 1) << h
Several other examples: the ancestor of node 3 with height h = 4 is a(3,0) = ((3 >> 4) | 1) << 4 = ((0) | 1) << 4 = 1 << 4 = 16; the ancestor of node 3 (00011) with height h = 1 is a(3,1) = ((3 >> 1) | 1) << 1 = ((1) | 1) << 1 = 1 << 1 = 2; and, the ancestor of node 3 with height h = 0 is a(3,0) = ((3 >> 0) | 1) << 0 = ((3) | 1) << 0 = 3 << 0 = 3.
Because of these two properties, it is possible to compute the nearest common ancestor for two nodes in the complete binary tree for any two nodes x and y using the following formula:
Let h = ⌊ log2(x & –y) ⌋ where the label for x is less than or equal to the label for y, then
NCAcbt(x, y) = ((x >> h) | 1) << h
We now turn our attention back to the nearest common ancestor problem for a general tree. Given a tree T = (V, E, r ∈V), the Schieber-Vishkin algorithm requires the following functions to be computed in a preprocessing step:
1) Preorder: V → ℕ
Function Preorder(v) maps nodes of the tree into integers that are the preorder numbers of the nodes in the tree.
2) Max: V → ℕ
Function Max(v) maps nodes of the tree into the maximum preorder number of a node in the subtree for v.
3) Inlabel: V → ℕ
Function Inlabel(v), known as the inlabel of a node, maps nodes of the tree into integer labels which correspond to the inorder numbers for nodes in the complete binary tree. Function Inlabel(v) is defined as:
Inlabel(v) = ⌊ log2(Preorder(v) & – Max(v)) ⌋
4) Ascendant: V → ℕ
Function Ascendant(v), known as the ascendant of a node, maps nodes of the tree into integers. The ascendant is a bit map defined according to the following expression:
Ascendant(v) = Ascendant(Parent(v)) | (Inlabel(v) & -Inlabel(v)), where Ascendant(r) = 0
The idea of the Ascendant(v) mapping is that it encodes the inlabel numbers of all ancestors of node v. A 1-bit in Ascendant(v) encodes the height of an ancestor of the node; a 0-bit indicates there is no ancestor at that level in the complete binary tree.
5) Head: ℕ → V
Function Head(v), known as the head of a node, maps an integer label that is the result of a simple calculation for the nearest common ancestor in the complete binary tree into a node in the general tree. Function Head(v) is defined as:
Head(x) = v such that Inlabel(v) ≠ Inlabel(Parent(v)) ∩ Inlabel(v) = x
Another way to understand this definition is that Head(x) is the node in the general tree that has the node that is an ancestor of x but has a different inlabel value as x.
The Inlabel(v) relationship maps a node in the general tree to a node in the complete binary tree. It is important to understand that the mapping Inlabel(v) is not a unique mapping, i.e., two nodes in the general tree can map to the same node in the complete binary tree. In order to uniquely identify the nearest common ancestor of two nodes in the general tree, Ascendant(v), and Head(v) are needed. Inlabel(v) maintains two properties which are crucial for the algorithm to compute the nearest common ancestor. First, Inlabel(v) partitions paths in the general tree, which are known as inlabel paths. All of these paths partition the general tree into distinct paths. This property is known as the path-partition property. Second, a node d that is a descendant of node p in the general tree map to nodes Inlabel(d) and Inlabel(p) in the complete binary tree, respectively, such that Inlabel(d) is a descendant of (or the same as) Inlabel(p). This is known as the descendance-preservation property.
The preprocessing step can be done in two passes over the general tree in a depth-first search. During the traversal, the values for Preorder(v), Max(v), Inlabel(v), and Head(v) are computed. Preorder(v) can be computed using a global variable for the node visited, incremented after associating the value with the node. Max(v) is computed after traversing a node’s entire subtree, associating the value of the global variable (minus 1 since it is post-incremented). After computing Preorder(v) and Max(v), Inlabel(v) can then be computed using its definition because all dependencies have been computed. Head(v) can be computed in a couple different ways. However, one method is to assign the node to an array indexed by the Inlabel(v) value. As the depth-first search returns, the array entry is replaced with nodes in a particular inlabel path higher up in the general tree. In the second pass, function Ascendant(v) can be computed, using information from the previous pass (using Ascendant(Parent(v)) and Inlabel(v)).
The nearest common ancestor is computed in the following algorithm.
// Let x and y be the two nodes in the general tree.
// Compute the height of the nearest common ancestor in the complete binary tree:
let height(x,y) = ⌊ log2(Inlabel(x) & -Inlabel(y)) ⌋
// Note, if Inlabel(x) is greater than Inlabel(y), then reverse the assignments of x and y.
// Compute the true height of the nearest common ancestor in the general tree:
let k = Ascendant(x) & Ascendant(y) & (1 << height(x, y)), and trueheight(x, y) = ⌊ log2(k & -k) ⌋.
// Compute two candidates for the nearest common ancestor.
let j = ((Inlabel(x) >> trueheight(x, y)) | 1) << trueheight(x, y).
if j = Inlabel(x) then let xˆ = x
else
let L = ⌊ log2(Ascendant(x) & ((1 << trueheight(x, y)) - 1) ⌊
x̂ = Parent(Head((Inlabel(x) >> L) | 1) << L))
if j = Inlabel (y) then let ŷ = y
else
let L = ⌊ log2(Ascendant(y) & ((1 << trueheight(x, y)) - 1) ⌋
ŷ = Parent(Head((Inlabel(y) >> L) | 1) << L))
let NCASV(x,y) = z, where z = xif Preorder( x̂ ) < Preorder( ŷ ), else ŷ.
asd
Implementation in C#
1 using System;
2 using System.Collections.Generic;
3 using System.Text;
4
5 namespace GeneralTreeLayout
6 {
7 class NCA
8 {
9 Tree tree;
10 Dictionary<Vertex, int> Preorder;
11 Dictionary<Vertex, int> Inlabel;
12 Dictionary<Vertex, int> Ascendant;
13 Dictionary<int, Vertex> Head;
14
15 public NCA(Tree t)
16 {
17 this.tree = t;
18 this.Preorder = new Dictionary<Vertex, int>();
19 this.Inlabel = new Dictionary<Vertex, int>();
20 this.Ascendant = new Dictionary<Vertex, int>();
21 this.Head = new Dictionary<int, Vertex>();
22 }
23
24 public void Preprocess()
25 {
26 this.PreorderTraverse(this.tree.Root);
27 this.ComputeAscendants(this.tree.Root, 0);
28 }
29
30 int number = 1;
31
32 public void PreorderTraverse(Vertex vertex)
33 {
34 int low = number;
35 this.Preorder.Add(vertex, number++);
36 foreach (Edge edge in vertex.Successors)
37 {
38 this.PreorderTraverse(edge.To);
39 }
40 int high = number - 1;
41 int lh = (high & -low);
42 int h = this.FloorLog2((uint)lh);
43 int inlabel = ((high >> h) | 1) << h;
44 this.Inlabel.Add(vertex, inlabel);
45 if (!this.Head.ContainsKey(inlabel))
46 this.Head.Add(inlabel, vertex);
47 else
48 this.Head[inlabel] = vertex;
49 }
50
51 void ComputeAscendants(Vertex vertex, int pascendant)
52 {
53 int inlabel = this.Inlabel[vertex];
54 int lsb = inlabel & -inlabel;
55 int ascendant = pascendant | lsb;
56 this.Ascendant.Add(vertex, ascendant);
57 foreach (Edge edge in vertex.Successors)
58 {
59 this.ComputeAscendants(edge.To, ascendant);
60 }
61 }
62
63 public Vertex ComputeNCA(Vertex x, Vertex y)
64 {
65 int inlabelx = this.Inlabel[x];
66 int inlabely = this.Inlabel[y];
67 int xy = (inlabely & -inlabelx);
68 int ch = FloorLog2((uint)xy);
69 int ancestor = ((inlabely >> ch) | 1) << ch;
70 int ax = this.Ascendant[x];
71 int ay = this.Ascendant[y];
72 int k = ax & ay & -(1 << ch);
73 int th = FloorLog2((uint)(k & -k));
74 int j = ((inlabelx >> th) | 1) << th;
75 Vertex xhat = null;
76 Vertex yhat = null;
77 if (j == inlabelx)
78 xhat = x;
79 else
80 {
81 int l = this.FloorLog2((uint)( ax & ((1 << th) - 1)));
82 int anc = ((inlabelx >> l) | 1) << l;
83 xhat = this.Head[anc].Parent;
84 }
85 if (j == inlabely)
86 yhat = y;
87 else
88 {
89 int l = this.FloorLog2((uint)( ay & ((1 << th) - 1)));
90 int anc = ((inlabely >> l) | 1) << l;
91 yhat = this.Head[anc].Parent;
92 }
93 Vertex z = null;
94 if (this.Preorder[xhat] <= this.Preorder[yhat])
95 z = xhat;
96 else
97 z = yhat;
98 return z;
99 }
100
101 int FloorLog2(uint v)
102 {
103 uint x = 0;
104 while ((v = (v >> 1)) != 0)
105 {
106 x++;
107 }
108 return (int)x;
109 }
110 }
111 }
Example
Figure 3. Finding the nearest common ancestor using the Schieber-Vishkin algorithm.
(a)
(b)
(c)
Consider the example tree displayed in Figure 3(a). Compute the nearest common ancestor of node F and node N.
First, label each node in the tree using a preorder numbering (line 26, and line 32 through 49). These results are shown in Table 1. For each node, the method PreorderTraverse also calculates the Inlabel(v) values. The Head(v) mapping is also calculated. These results are shown in Table 2.
Table 1. Shrieber-Vishkin computation for Figure 3
Node v | P(v) | M(v) | I(v) | A(v) |
A | 1 | 18 | 16 | 16 |
B | 2 | 18 | 16 | 16 |
C | 3 | 6 | 4 | 20 |
D | 7 | 12 | 8 | 24 |
E | 13 | 18 | 16 | 16 |
F | 4 | 4 | 4 | 20 |
G | 5 | 5 | 5 | 21 |
H | 6 | 6 | 6 | 22 |
I | 8 | 11 | 8 | 24 |
J | 12 | 12 | 12 | 28 |
K | 14 | 17 | 16 | 16 |
L | 18 | 18 | 18 | 18 |
M | 9 | 9 | 9 | 25 |
N | 10 | 10 | 10 | 26 |
O | 11 | 11 | 11 | 25 |
P | 15 | 15 | 15 | 17 |
Q | 16 | 16 | 16 | 16 |
R | 17 | 17 | 17 | 17 |
Table 2. Shrieber-Vishkin head(x) for Figure 3
Complete binary tree node x | H(x) |
4 | C |
5 | G |
6 | H |
9 | M |
10 | N |
11 | O |
8 | D |
12 | J |
15 | P |
16 | A |
17 | R |
18 | L |
This method uses the function FloorLog2(v), which calculates the expression ⌊ log2(v) ⌋. (Note, there are several other ways to compute the expression that are faster than the one show here). Next, the Ascendant mapping is computed (line 27 and lines 51 through 61).
The function ComputeNCA computes the nearest common ancestor (lines 63 through 99). For nodes F and N, the height of the nearest common ancestor in the complete binary tree is height(F, N) = ⌊ log2(Inlabel(F) & –Inlabel(N)) ⌋ = 4. The true height is first found via k = Ascendant(F) & Ascendant(N) & (1 << height(F, N)) = 20 & 26 & (1 << 4) = 10100 & 11010 & 10000 = 10000 = 16. Then trueheight(F, N) = ⌊ log2(16 & -16) ⌋ = 3. Notice that the height in the general tree is different from the height in the complete binary tree. Next, compute the candidates for the NCA. Let j = ((Inlabel(F) >> trueheight(F, N)) | 1) << trueheight(F, N) = ((4 >> 3) | 1) << 3 = 16. The results of computing and both are node B. So, NcaSV(F, N) = B.
Alstrup-Gavoille-Kaplan-Rauhe solution for the nearest common ancestor
The Alstrup-Gavoille-Kaplan-Rauhe (AGKR) algorithm [9, 10] is another solution to the nearest common ancestor problem, and is similar to the Schieber-Vishkin algorithm. After preprocessing the general tree in O(n) time, the solution can be computed in O(1) time. The main difference between the AGKR algorithm and the Schieber-Vishkin algorithm is that the Schieber-Vishkin algorithm maps the general tree into a complete binary tree, whereas the AGKR algorithm does not. Instead, the AGKR algorithm is based on the computation of a common prefix between the labels of the two nodes in the general tree.
The AGKR algorithm creates labels for each node in the general tree by partitioning the nodes and edges by classify each as either light or heavy, a scheme that was introduced by Harel and Tarjan [7]. A heavy node is a node has the greatest subtree size of all its siblings. For each internal node v, we pick a child w of v, where size(w) = max{size(z) | z ∈children(v)} and classify w as heavy. All other nodes are light nodes. All nodes that have no siblings are light nodes, and the root is defined to be a light node. A heavy path is an edge from a node of any class to a node of heavy class. All other paths are light paths.
Given a tree T = (V, E, r ∈V), the AGKR algorithm defines the following functions:
1) Size: V → ℕ
Function Size(v) returns the size of the entire subtree of a node.
2) IsHeavy: V → bool
Function IsHeavy(v) partitions nodes of the general tree into heavy (denoted by true) or light (denoted by false):
IsHeavy(v) = true if |Children(Parent(v))| > 1 and [ ∀ c ∈Children(Parent(v)), where c ≠ v and Size(c) ≤ Size(v)], else false.
In other words, a node v is function heavy if there is exists a sibling of v, and all siblings of v have a subtree size less than the subtree size for v.
3) Apex: V → V
Function Apex(v) finds the apex of a node (see below).
4) LightLabel: V → ℕ
Function LightLabel(v) returns the light label for a node. The light label of a node is:
LightLabel(v) = Size(v)
5) HeavyLabel: V → ℕ
Function HeavyLabel(v) returns the heavy label for a node. The heavy label of a node is:
HeavyLabel(v) = Size(v) – Size(h), where h Î Children(v) and IsHeavy(h) = true
6) FullLabel: V → ℕ
Function FullLabel(v) maps nodes of the tree into integer labels. Labels for the node are binary strings that are aligned to the leftmost (most significant) bit of an integer. The integer labels are scaled to the appropriate size so that they fit into an appropriate integer unit on a machine (e.g., 32-bit unsigned integers).
7) Mask: V → ℕ
Function Mask(v) maps nodes of the tree into integer masks. The mask is a binary string that corresponds to the label FullLabel(v) and notes portions of the label that are either light or heavy, explained below.
8) Reverse: ℕ → V
Function Reverse(v) maps an integer that is the result of a simple calculation for the nearest common ancestor into a node in the tree. The mapping should always exist, since the two nodes in the nearest common ancestor problem are in the tree.
In the preprocessing step, the AGKR algorithm first classifies nodes and edges of the tree as either light or heavy, a scheme that was introduced by Harel and Tarjan [7]. A heavy node is a node that must have at least one sibling and the node has the greatest subtree size of all its siblings. All other nodes are light nodes. The nearest ancestor a of a node v in which node a is light and node v is heavy is known as an apex node. The sequence of heavy nodes from the apex node is known as a heavy path. The heavy path starts at a light node, called the apex. Using this partitioning, two types of labels are constructed for each node: a heavy label, and a light label. The heavy and light labels are encoded in binary strings by considering the labels in a sequence:
a) For heavy label encoding, the sequence is formed from the heavy labels for nodes in the heavy path. For example, the heavy labels for the heavy path A->B->C are encoded by considering the labels in sequence: HeavyLabel(A), HeavyLabel(B), HeavyLabel(C). There is only one encoding for each heavy label because the heavy node can belong only to one heavy path. The encoding for heavy labels of light nodes is the empty string.
b) For light label encoding, the sequence is formed from the light labels for sibling nodes that are light nodes. For example, supposed the children of node A are B, C, and D. Node C is a heavy node, and nodes B and D are light nodes. The sequence for encoding the light labels is: LightLabel(B), LightLabel(D). The encoding for light labels of heavy nodes is the empty string.
The encoding is calculated as follows:
1) Create a sequence yi, I = 1 to k, from the integer values, either from the heavy labels, or the light children of a node. For example: heavy labels for heavy chain of nodes A->B->D->I->M are 1, 11, 2, 3, 1 (see Figure 4).
2) Compute si = ∑ yj, j=1 .. i, and i=1 .. k-1, s0 = 0. From the example in the previous step, si = 0, 1, 12, 14, 17, 18.
3) Compute fi = ⌊ log2(yi) ⌋, i = 1 .. k. From the example in the previous steps, fi = 0, 1, 8, 2, 2, 1.
4) Compute zi, i = 0 .. k:
zi = (si-1 mod 2fi = 0) ? si-1 : si-1 – (si-1 mod 2fi) + 2fi
From the example in the previous steps, zi = 0, 0, 8, 12, 14, 17.
5) Compute m = max(zi) for all i = 0 .. k. From the example in the previous steps, m = 17.
6) Compute w = ⌊ log2(m) ⌋ + 1. From the example in the previous steps, w = 5.
7) Compute encoding ei = (zi >> fi) encoded with w – fi bits, i = 1 .. k. From the example in the previous steps, the encodings for the labels in step 1 are 00000, 01, 0110, 0111, 10001.
We define functions EncodedHeavyLabel(v) and EncodedLightLabel(v) as the encoded heavy and light labels for node v, respectively.
Computing the NCA.
Each label for a node uniquely identifies all of the edges from the root to the node. The encodings constructed are a combination of the labels down a “heavy” path in the tree and across the “light” edges for siblings of a node. For any node, the encoding is defined as
FullLabel(v) = FullLabel(Parent(Apex(v)) · EncodedLightLabel(Apex(v)) · EncodedHeavyLabel(v)
Given two nodes x and y that are in the tree the NCA is computed by computing the bitwise-AND between the two full labels and noting the location of the first difference from the left end of the labels. The difference between the two labels can occur either within a light substring or within a heavy substring.
If the difference occurs in the light label substrings, then let FullLabel(x) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 Li … and FullLabel(y) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 L’i …, where Li ¹ L’I. Then FullLabel(NCA(x, y)) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1.
If the difference occurs in the heavy label substrings, then let FullLabel(x) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 Li Hi … and FullLabel(y) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 Li H’i …, where Hi ¹ H’I. Then FullLabel(NCA(x, y)) = L0 H0 L1 H1 L2 H2 …Li-1 Hi-1 min(H1, H’1). Note that the last substring of the label for the NCA is the minimum substring via alphabetic ordering.
Mask(v) is used to indicate which bits in the label are part of the heavy or light substrings. The NCA is computed by the expression Reverse(FullLabel(NCA(x, y))).
Implementation in C#
1 using System;
2 using System.Collections.Generic;
3 using System.Linq;
4 using System.Text;
5
6 namespace GeneralTreeLayout
7 {
8 class Alstrup : NCA
9
10 Dictionary<Vertex, int> Number;
11 List<Vertex> heavy;
12 List<Vertex> light;
13 Dictionary<Vertex, Vertex> apex;
14 List<Vertex> UniqueApex;
15 Dictionary<Vertex, int> hlabel;
16 Dictionary<Vertex, int> llabel;
17 Dictionary<Vertex, String> lightLabelEncodingString;
18 Dictionary<Vertex, String> heavyLabelEncodingString;
19 Dictionary<Vertex, String> labelEncodingString;
20 Dictionary<Vertex, uint> labelEncodingBinary;
21 Dictionary<Vertex, uint> kEncodingBinary;
22
23 public Alstrup(Tree t)
24 : base(t)
25 {
26 this.Number = new Dictionary<Vertex,int>();
27 this.heavy = new List<Vertex>();
28 this.light = new List<Vertex>();
29 this.apex = new Dictionary<Vertex, Vertex>();
30 this.UniqueApex = new List<Vertex>();
31 this.hlabel = new Dictionary<Vertex, int>();
32 this.llabel = new Dictionary<Vertex, int>();
33 this.lightLabelEncodingString = new Dictionary<Vertex, string>();
34 this.heavyLabelEncodingString = new Dictionary<Vertex, string>();
35 this.labelEncodingString = new Dictionary<Vertex, string>();
36 this.labelEncodingBinary = new Dictionary<Vertex, uint>();
37 this.kEncodingBinary = new Dictionary<Vertex, uint>();
38 }
39
40 override public void Preprocess()
41 {
42 this.ComputeSize(this.tree.Root);
43 this.light.Add(this.tree.Root);
44 this.Partition(this.tree.Root);
45 this.ComputeApex(this.tree.Root, this.tree.Root);
46 foreach (Vertex v in this.UniqueApex)
47 {
48 this.ComputeHeavyLabels(v);
49 }
50 this.ComputeLightLabels(this.tree.Root);
51 foreach (Vertex v in this.UniqueApex)
52 {
53 this.EncodeHeavyLabels(v);
54 }
55 this.lightLabelEncodingString.Add(this.tree.Root, "");
56 this.EncodeLightLabels(this.tree.Root);
57 this.ComputeFullLabels(this.tree.Root);
58 }
59
60 private int ComputeSize(Vertex v)
61 {
62 // Count the number of children in the whole subtree.
63 int sum = 1;
64 foreach (Edge e in v.Successors)
65 {
66 Vertex c = e.To;
67 sum += this.ComputeSize(c);
68 }
69 this.Number.Add(v, sum);
70 return sum;
71 }
72
73 private void ComputeBinaryLabels(Vertex v, String s)
74 {
75 // Convert s into a binary and create a binary for h/l substring
76 // boundaries.
77 uint bin = 0;
78 uint hl = 0;
79 uint hlbin = 0;
80 int len = 0;
81 for (int i = 0; ; ++i)
82 {
83 if (i == s.Length)
84 break;
85 if (s[i] == '1')
86 {
87 bin = bin << 1;
88 bin |= 1;
89 len++;
90 hlbin = hlbin << 1;
91 hlbin |= hl;
92 }
93 else if (s[i] == '0')
94 {
95 bin = bin << 1;
96 len++;
97 hlbin = hlbin << 1;
98 hlbin |= hl;
99 }
100 else if (s[i] == 'L')
101 {
102 hl = 0;
103 }
104 else if (s[i] == 'H')
105 {
106 hl = 1;
107 }
108 }
109 // Shift binaries all the way to the MSB, and
110 // set up hl to note end of label.
111 int diff = 32 - len;
112 hl = (hl == 0) ? (uint)1 : (uint)0;
113 for (int i = 0; i < diff; ++i)
114 {
115 bin = bin << 1;
116 hlbin = hlbin << 1;
117 hlbin |= hl;
118 }
119 this.labelEncodingBinary.Add(v, bin);
120 this.kEncodingBinary.Add(v, hlbin);
121 }
122
123 private void ComputeFullLabels(Vertex v)
124 {
125 // Compute the labelEncodingString for a node v.
126 Vertex ap = this.apex[v];
127 Vertex parent = ap.Parent;
128 String t = "";
129 if (parent != null)
130 {
131 if (this.labelEncodingString.ContainsKey(parent))
132 t = this.labelEncodingString[parent];
133 }
134 String ll = "";
135 if (this.lightLabelEncodingString.ContainsKey(ap))
136 ll = this.lightLabelEncodingString[ap];
137 String hl = "";
138 if (this.heavyLabelEncodingString.ContainsKey(v))
139 hl = this.heavyLabelEncodingString[v];
140 String total = t + "L" + ll + "H" + hl;
141 this.labelEncodingString.Add(v, total);
142 this.ComputeBinaryLabels(v, total);
143 foreach (Edge e in v.Successors)
144 {
145 this.ComputeFullLabels(e.To);
146 }
147 }
148
149 private void Partition(Vertex v)
150 {
151 // Find child with max size.
152 int maxsize = 0;
153 Vertex max = null;
154 foreach (Edge e in v.Successors)
155 {
156 Vertex to = e.To;
157 int value = this.Number[to];
158 if (value > maxsize)
159 {
160 maxsize = value;
161 max = to;
162 }
163 }
164 if (max != null)
165 {
166 this.heavy.Add(max);
167 }
168 foreach (Edge e in v.Successors)
169 {
170 if (e.To != max)
171 {
172 this.light.Add(e.To);
173 }
174 }
175 foreach (Edge e in v.Successors)
176 {
177 this.Partition(e.To);
178 }
179 }
180
181 private void ComputeApex(Vertex v, Vertex a)
182 {
183 if (this.light.Contains(v))
184 {
185 // A light node is an apex by definition.
186 this.UniqueApex.Add(v);
187 this.apex.Add(v, v);
188 a = v;
189 }
190 else
191 {
192 this.apex.Add(v, a);
193 }
194 foreach (Edge e in v.Successors)
195 {
196 Vertex c = e.To;
197 this.ComputeApex(c, a);
198 }
199 }
200
201 private void ComputeHeavyLabels(Vertex v)
202 {
203 int size = this.Number[v];
204 int h = 0;
205 foreach (Edge e in v.Successors)
206 {
207 if (this.heavy.Contains(e.To))
208 {
209 h = this.Number[e.To];
210 this.ComputeHeavyLabels(e.To);
211 break;
212 }
213 }
214 size -= h;
215 this.hlabel.Add(v, size);
216 }
217
218 private void ComputeLightLabels(Vertex v)
219 {
220 if (this.light.Contains(v))
221 this.llabel.Add(v, this.Number[v]);
222 foreach (Edge e in v.Successors)
223 {
224 this.ComputeLightLabels(e.To);
225 }
226 }
227
228 private void EncodeHeavyLabels(Vertex a)
229 {
230 List<Vertex> hp = this.GetHP(a);
231 List<int> hpHLabel = this.GetHeavyLabels(hp);
232 List<String> b = IntegerAlphabeticEncoding.Encoding(hpHLabel);
233 for (int i = 0; i < hp.Count; ++i)
234 {
235 this.heavyLabelEncodingString.Add(hp[i], b[i]);
236 }
237 }
238
239 private List<Vertex> GetHP(Vertex v)
240 {
241 if (!this.UniqueApex.Contains(v))
242 throw new Exception("Not an apex.");
243 List<Vertex> list = new List<Vertex>();
244 list.Add(v);
245 Vertex next = null;
246 do
247 {
248 next = null;
249 foreach (Edge e in v.Successors)
250 {
251 if (this.heavy.Contains(e.To))
252 {
253 next = e.To;
254 list.Add(next);
255 break;
256 }
257 }
258 v = next;
259 } while (next != null);
260 return list;
261 }
262
263 private List<int> GetHeavyLabels(List<Vertex> list)
264 {
265 // Convert a list of vertices into a list of heavy labels.
266 List<int> newlist = new List<int>();
267 foreach (Vertex v in list)
268 {
269 newlist.Add(this.hlabel[v]);
270 }
271 return newlist;
272 }
273
274 private void EncodeLightLabels(Vertex v)
275 {
276 List<Vertex> lightvertices = new List<Vertex>();
277 foreach (Edge e in v.Successors)
278 {
279 if (this.light.Contains(e.To))
280 {
281 lightvertices.Add(e.To);
282 }
283 }
284 List<int> lightlabels = this.GetLightLabels(lightvertices);
285 List<String> labels = IntegerAlphabeticEncoding.Encoding(lightlabels);
286 this.AssociateLLabels(lightvertices, labels);
287 foreach (Edge e in v.Successors)
288 {
289 this.EncodeLightLabels(e.To);
290 }
291 }
292
293 private List<int> GetLightLabels(List<Vertex> list)
294 {
295 // Convert a list of vertices into a list of light labels.
296 List<int> newlist = new List<int>();
297 foreach (Vertex v in list)
298 {
299 newlist.Add(this.llabel[v]);
300 }
301 return newlist;
302 }
303
304 private void AssociateLLabels(List<Vertex> vertices, List<String> labels)
305 {
306 for (int i = 0; i < vertices.Count; ++i)
307 {
308 this.lightLabelEncodingString.Add(vertices[i], labels[i]);
309 }
310 }
311
312 override public Vertex ComputeNCA(Vertex x, Vertex y)
313 {
314 // NCA label to be found.
315 uint ncalabel = 0;
316 uint ncak = 0;
317
318 // Get labels for each vertex.
319 uint xs = this.labelEncodingBinary[x];
320 uint ys = this.labelEncodingBinary[y];
321
322 // Get heavy/light masks for each vertex.
323 // These are needed to know whether the first difference
324 // in the labels is on a heavy label substring, or a light
325 // label substring.
326 uint xk = this.kEncodingBinary[x];
327 uint yk = this.kEncodingBinary[y];
328
329 // Compute the common prefix and the location
330 // of the first difference from the left.
331 int h = Bithacks.FloorLog2(xs ^ ys);
332 int hk = Bithacks.FloorLog2(xk ^ yk);
333 if (h == 0)
334 h = hk;
335 if (h == 0)
336 return x;
337 uint v = (uint)(1 << h);
338 uint commonprefix = (xs >> h) << h;
339
340 // Compute if the bit that is different is heavy or light
341 // in either of the two labels.
342 // ... or both!
343 bool xheavy = (xk & v) != 0;
344 bool yheavy = (yk & v) != 0;
345
346 // Count number of heavy/light changes in each label.
347 // We need to know the first actual difference.
348 int xcount = 0;
349 {
350 int i = 31;
351 for (; i >= h; --i)
352 {
353 bool hv = (xk & (1 << i)) != 0;
354 if (hv)
355 {
356 xcount++;
357 for (; i >= h; --i)
358 if ((xk & (1 << i)) == 0)
359 {
360 xcount++;
361 break;
362 }
363 }
364 else
365 {
366 xcount++;
367 for (; i >= h; --i)
368 if ((xk & (1 << i)) != 0)
369 {
370 xcount++;
371 break;
372 }
373 }
374 }
375 }
376 int ycount = 0;
377 {
378 int i = 31;
379 for (; i >= h; --i)
380 {
381 bool hv = (yk & (1 << i)) != 0;
382 if (hv)
383 {
384 ycount++;
385 for (; i >= h; --i)
386 if ((yk & (1 << i)) == 0)
387 {
388 ycount++;
389 break;
390 }
391 }
392 else
393 {
394 ycount++;
395 for (; i >= h; --i)
396 if ((yk & (1 << i)) != 0)
397 {
398 ycount++;
399 break;
400 }
401 }
402 }
403 }
404
405 // Determine which is different, the L bits or the H bits.
406 int j;
407 bool xboundary, yboundary;
408 for (j = h + 1; j <= 31; j++)
409 {
410 // Are we on a boundary between the heavy/light labels?
411 uint vm1 = (uint)(1 << j);
412 xboundary = ((xk & vm1) != 0 ? !xheavy : xheavy);
413 yboundary = ((yk & vm1) != 0 ? !yheavy : yheavy);
414 if (xboundary || yboundary)
415 break;
416 }
417
418 // Determine what kind of boundary.
419 uint k;
420 bool isheavy;
421 if (xcount < ycount)
422 {
423 k = xk;
424 isheavy = xheavy;
425 }
426 else
427 {
428 k = yk;
429 isheavy = yheavy;
430 }
431
432 if (!isheavy)
433 {
434 // L bits differ.
435 // back up before L.
436 uint xmask = (uint)-(1 << j);
437 ncalabel = xs & xmask;
438 ncak = xk & xmask;
439 }
440 else
441 {
442 // H bits differ.
443 // First, back up before H.
444 ncalabel = xs & (uint)-(1 << j);
445
446 // move forward picking alphabetic minimum H bits
447 // until next L.
448 uint xsm = xs;
449 uint ysm = ys;
450 uint xmask;
451 uint ymask;
452 {
453 // Find specific masks to get prefix plus heavy label.
454 // Scan ahead for 0-bit.
455 int i;
456 for (i = Bithacks.FloorLog2((uint)v); ; --i)
457 {
458 if ((xk & (1 << i)) == 0)
459 break;
460 }
461 i++; // backup.
462 // get mask of all 1's up to heavy label and apply to the label.
463 xmask = (uint)-(1 << i);
464 xsm &= xmask;
465 }
466 {
467 // Find specific masks to get prefix plus heavy label.
468 // Scan ahead for 0-bit.
469 int i;
470 for (i = Bithacks.FloorLog2((uint)v); ; --i)
471 {
472 if ((yk & (1 << i)) == 0)
473 break;
474 }
475 i++; // backup.
476 // get mask of all 1's up to heavy label and apply to the label.
477 ymask = (uint)-(1 << i);
478 ysm &= ymask;
479 }
480 // Got two labels, pick the one that is alphabetic.
481 if (xsm < ysm)
482 {
483 ncalabel = xsm;
484 ncak = xk & xmask;
485 }
486 else
487 {
488 ncalabel = ysm;
489 ncak = yk & ymask;
490 }
491 }
492
493 Vertex found = null;
494 foreach (KeyValuePair<Vertex, uint> match in this.labelEncodingBinary)
495 {
496 if (match.Value == ncalabel)
497 {
498 if (this.kEncodingBinary[match.Key] == ncak)
499 {
500 found = match.Key;
501 break;
502 }
503 }
504 }
505 return found;
506 }
507 }
508 }
509
Example
Figure 4. Finding the nearest common ancestor using the Alstrup-Gavoille-Kaplan-Rauhe algorithm.
Consider the example tree displayed in Figure 4. Compute the nearest common ancestor of nodes F and N.
In a series of passes over the tree, the algorithm performs preprocessing (lines 40-58): Size(v) at lines 42 and 60-71; IsHeavy(v) at lines 43, 44, and 149-179; Apex(v) at lines 45 and 181-199; HeavyLabel(v) at lines 46-49 and 201-216; LightLabel(v) at lines 50 and 218-226; EncodeHeavyLabel(v) at lines 51-54 and 228-272; EncodeLightLabel(v) at lines 56 and 274-310; and FullLabel(v) at lines 57 and 73-147. The results of the computation are displayed in Table 3.
The NCA for nodes F and N is now computed (lines 312-508). The full labels for the two nodes are found (lines 326 and 327). FullLabel(F) = 580016. FullLabel(N) = 700016. FullLabel(NCA(F, N)) = FullLabel(F) & FullLabel(N) = 580016 & 700016 = 500016. Mask(F) = D80016. Mask(N) = F40016. The most significant bit that is different is bit 13 (bit 0 is the least significant bit, bit 15 is the most significant), found at line 331. This bit corresponds to a bit difference in the heavy label substring of FullLabel(N) but in the light label substring of FullLabel(F) (lines 343-344). The number of heavy/light substring boundary crossings is next counted starting from the left edge of each full label, xcount = 2, ycount = 1 (lines 348-403). The bit that is different is on a heavy/light substring boundary for FullLabel(N), but not for FullLabel(F). All this information leads one to conclude that the bit that is different is in the heavy label substring of FullLabel(N). In this case, the minimum between the two heavy label substrings must be chosen (lines 441-491). The labels are computed and compared (line 481) and the FullLabel(NCA(F, N)) = 400016. Therefore, NCA(F, N) = Reverse(FullLabel(NCA(F, N))) = Reverse(4000) = B (line 500).
Table 3. Alstrup-Gavoille-Kaplan-Rauhe computation for Figure 4.
Node, v | Size(v) | Class(v) | Light label(v) | Encoded Light label(v) | Heavy label(v) | Encoded Heavy label(v) | Label definition | Full Label with light and heavy boundary indicators | Full Label (16-bit value) | Heavy/light regions (16-bit value) |
A | 18 | L | 18 | e | 1 | 00000 | L(A).H(A) | LH00000 | 0000 | F800 |
B | 17 | H | – | 11 | 01 | L(A).H(B) | LH01 | 4000 | C000 | |
C | 4 | L | 4 | 0 | 3 | 0 | L(A).H(B).L(C).H(C) | LH01L0H0 | 4000 | D000 |
D | 6 | H | – | 2 | 0110 | L(A).H(D) | LH0110 | 6000 | F000 | |
E | 6 | L | 6 | 1 | 2 | 00 | L(A).H(B).L(E).H(E) | LH01L1H00 | 6000 | D800 |
F | 1 | H | – | 1 | 11 | L(A).H(B).L(C).H(F) | LH01L0H11 | 5800 | D800 | |
G | 1 | L | 1 | 0 | 1 | 0 | L(A).H(B).L(C).H(C).L(G).H(G) | LH01L0H0L0H0 | 4000 | D400 |
H | 1 | L | 1 | 1 | 1 | 0 | L(A).H(B).L(C).H(C).L(H).H(H) | LH01L0H0L1H0 | 4800 | D400 |
I | 4 | H | – | 3 | 0111 | L(A).H(I) | LH0111 | 7000 | F000 | |
J | 1 | L | 1 | 0 | 1 | 0 | L(A).H(D).L(J).H(J) | LH0110L0H0 | 6000 | F400 |
K | 4 | H | – | 3 | 01 | L(A).H(B).L(E).H(K) | LH01L1H01 | 6800 | D800 | |
L | 1 | L | 1 | 0 | 1 | 0 | L(A).H(B).L(E).H(E).L(L).H(L) | LH01L1H00L0H0 | 6000 | Da00 |
M | 1 | H | – | 1 | 10001 | L(A).H(M) | LH10001 | 8800 | F800 | |
N | 1 | L | 1 | 0 | 1 | 0 | L(A).H(I).L(N).H(N) | LH0111L0H0 | 7000 | F400 |
O | 1 | L | 1 | 1 | 1 | 0 | L(A).H(I).L(O).H(O) | LH0111L1H0 | 7800 | F400 |
P | 1 | H | – | 1 | 101 | L(A).H(B).L(E).H(P) | LH01L1H101 | 7400 | Dc00 | |
Q | 1 | L | 1 | 0 | 1 | 0 | L(A).H(B).L(E).H(K).L(Q).H(Q) | LH01L1H01L0H0 | 6800 | Da00 |
R | 1 | L | 1 | 1 | 1 | 0 | L(A).H(B).L(E).H(K).L(R).H(R) | LH01L1H01L1H0 | 6c00 | Da00 |
References
[1] Aho, A., Hopcroft, J. and Ullman, J. On finding lowest common ancestors in trees. SIAM J. Comput., 51976), 115-132.
[2] Gusfield, D. Algorithms on Stings, Trees, and Sequences: Computer Science and Computational Biology. ACM SIGACT News, 28, 4 1997), 41-60.
[3] Farach, M., Kannan, S. and Warnow, T. A robust model for finding optimal evolutionary trees. Algorithmica, 13, 1 1995), 155-179.
[4] Walker II, J. A Node-positioning Algorithm for General Trees. Software – Practice and Experience, 20, 7 1990), 685-705.
[5] Di Battista, G. and Tamassia, R. On-line planarity testing. SIAM Journal on Computing, 251996), 956-997.
[6] Westbrook, J. Fast incremental planarity testing. Springer, City, 1992.
[7] Harel, D. and Tarjan, R. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 131984), 338.
[8] Knuth, D. Combinatorial Algorithms. City, 2008.
[9] Alstrup, S., Gavoille, C., Kaplan, H. and Rauhe, T. Nearest common ancestors: A survey and a new distributed algorithm. ACM New York, NY, USA, City, 2002.
[10] Alstrup, S., Gavoille, C., Kaplan, H. and Rauhe, T. Nearest common ancestors: A survey and a new algorithm for a distributed environment. Theory of Computing Systems, 37, 3 2004), 441-456.
Ken Domino says:
The Alstrup algorithm described here has one major performance problem. Can anyone find it? Hint: searching…