Tuesday, April 14, 2009

A Wonderful Programming Exercise

Yesterday, while reading the famous and excellent Introduction to Algorithms ("CLRS") book, I bumped on a seemingly innocuous problem on Appendix B (page 1092). The problem statement reads:
Show that by removing a single edge, we can partition the vertices of any n-vertex
binary tree into two sets A and B such that |A| <= 3n/4 and |B| <= 3n/4.
My first reaction was, "wow". If you are like me, you tend to consider binary trees as something very, very basic. Something a computer-science student gets to learn and love by second semester, and then move on to more fancy and complex stuff. Yet, this is the first time I came across to this very peculiar property of binary trees (even in the form of an exercise), which states that "you can always split any binary tree by removing some edge and have both components less or equal to 3/4 of the initial size, and equivalently greater or equal to 1/2 of the initial size". Why nature specially picked 3/4 and 1/4? I don't know.  (You may google it, I did, and didn't find the answer/proof).

Here is a simple image of a binary tree, courtesy of Wikipedia: A binary tree picture

Anyway, on to the point. My problem was that I couldn't even find a binary tree which I could not cut more evenly than 3/4 and 1/4 (because, the exercise implies that there are some binary trees that you cannot reduce the greater split component any less than 3/4). So how am I supposed to build some basic intuition to be guided to a formal proof? I didn't fancy drawing and redrawing lots of trees, trying to find the "right" example. So, being a programmer, that's where I thought I could use some help from the computer in front of me: why not let the computer find the example and show it to me?

So this was my new problem statement: enumerate *all* binary trees of a given size N. How about trying this problem yourself and leaving a comment with your solution? (I would be quite interested in a nice solution in Scala by the way).

Assuming you now go to your favorite editor and see if you can hack this, let me describe here what happened on my side. My first attempt was ugly. Real ugly. I felt bad doing it, even bad about myself as a programmer. "A real programmer would be able to hack through it in few minutes". Well, mine was ugly, and I didn't even finish it, perhaps it couldn't work that way at all. For history, I tried building all trees in a top-down fashion (by the way, some months ago I was curious about "how many binary trees are there with a given size", and did that top-down approach to find out how many, but that was a simpler problem, and I ended up rediscovering the Catalan numbers).

I gave up, did not bother with it for another few hours, and then it hit me: why not build the trees bottom-up? In a dynamic programming fashion. Knowing all trees of size (N-1), one can pretty easily find the trees with size N. 



Enough description, lets have some code:


public class Main {
public static void main(String[] args) {
int count = 0;

for (Node tree : TreeEnumerator.findTrees(5)) {
System.out.println(tree);
count++;
}
System.out.println(count);
}
}

class TreeEnumerator {
public static Collection<Node> findTrees(int n) {
List<Collection<Node>> knownTrees = new ArrayList<Collection<Node>>(n);
knownTrees.add(Collections.<Node>singleton(Nil.instance));
knownTrees.add(Collections.<Node>singleton(new InternalNode(Nil.instance, Nil.instance)));
for (int i = 2; i <= n; i++) {
final int kids = i - 1;
Collection<Node> trees = new LinkedList<Node>();
for (int leftKids = kids; leftKids >= 0; leftKids--) {
int rightKids = kids - leftKids;
Collection<Node> leftTrees = knownTrees.get(leftKids);
Collection<Node> rightTrees = knownTrees.get(rightKids);
for (Node left : leftTrees) {
for (Node right : rightTrees) {
trees.add(new InternalNode(left, right));
}
}
}
knownTrees.add(trees);
}
return knownTrees.get(n);
}
}

interface Node {
}

class Nil implements Node {
static final Nil instance = new Nil();

private Nil() { }

@Override
public String toString() {
return ".";
}
}

class InternalNode implements Node {
final Node left, right;

InternalNode(Node left, Node right) {
this.left = left;
this.right = right;
}

@Override
public String toString() {
return "(" + left + right + ")";
}
}
I was very, very happy with the end result and the easiness that new trees are generated, always reusing some existing lower-rank tree. I suppose one might only appreciate this if he had already tried and realized how easily a "simple" problem, thought just a little bit wrongly, may end up in something so complicated and unmanageable - then one really appreciates simplicity! This algorithm is very efficient (probably optimal), and only creates as many nodes as trees (of size N or less), no more. Also noteworthy is that no tree copying takes place anywhere, only tree sharing. (This is facilitated by the fact that nodes do not ren eference their parents).

All this is nice, but this is not the complete solution. We still have to find which tree of those cannot be broken more evenly than 1/4 and 3/4. An obvious approach is to take every tree, and break it in every place possible, and keep the most even cut of those, and check whether it is 3/4. But then, how would one actually cut the trees? Nulling references in nodes? But that would destroy all other trees sharing those nodes! Node copying? It would seem that we should after all fall back to copying every tree, then try all breaks, so the original trees are kept intact.

Since this gets quite a big post, I'm not going to show the code of my solution, just sketch it. It is just a simple neat trick really. You don't have to actually break the tree! Just define a recursive "int count()" method on Node, and for every tree, do a couple of actions:
  • Perform count() on the root, store it to "treeSize"
  • Perform count() on every other node, store it to "subTreeSize"
Then you automatically have the sizes of the two components "assuming" they were actually disconnected (while they are not; they are still the same tree). "subTreeSize" is the size of the tree component below the point we would cut it. "treeSize - subTreeSize" is the size of everything else, i.e. the second component. Voila!

Hopefully you would find this interesting enough and tried it out before reading this. 

Cheers!

PS: This blog post is a kind offer from the CoCo coffee-bar (wifi-enabled), located on Cowley St., at the beautiful city of Oxford. 

PS2: SyntaxHighlighter just won't work for me. :-( Sorry. I am no CSS expert, and would need something pre-integrated in the blogging software.