# Binary Search Trees

• An ordered dictionary ADT is a dictionary in which the keys have a total order, just like in a heap. You can insert, find, and remove elements, just as with a hash table. But unlike a hash table, you can also quickly find the element with minimum or maximum key, or the entry nearest another entry in the total order. An ordered dictionary does anything a dictionary or binary heap can do and more, albeit more slowly.

• A binary search tree (BST) is a simple way to implement an ordered dictionary.

• In fact, the STL (multi)set and (multi)map use a variation of a BST (a red-black tree) as their internal data structure.

## Representing Binary Trees

• Recall that a binary tree is a rooted tree wherein no node has more than two children. Additionally, every child is either a left child or a right child of its parent, even if it is its parent’s only child. In the most popular binary tree representation, each tree node has three pointers: a parent pointer, a left pointer and a right pointer. Each node also stores a key (its value).

• In a binary search tree, elements are maintained in a (somewhat) sorted order.

• A binary search tree satisfies the binary-search-tree property (also called the binary-search-tree invariant):

Let $x$ be a node in a binary search tree. If $y$ is a node in the left subtree of $x$, then $y.key \leq x.key$. If $y$ is a node in the right subtree of $x$, then $y.key \geq x.key$.

• When a node has only one child, that child is either a left child or a right child, depending on whether its key is smaller or larger than its parent’s key. A key equal to the parent’s key can go into either subtree.

• Below is an example of a binary search tree.

       18
/  \
12    25
/ \    / \
4  15  25  30
/  /  \    /
1  13  17  28
\  \       \
3  14      29

You can verify this is a binary search tree: for instance, the root is 18, its left subtree (rooted at 12) contains numbers from 1 to 17, and its right subtree (rooted at 25) contains numbers from 25 to 30.

• An in-order traversal of a binary search tree visits the nodes in sorted order. In this sense, a search tree maintains a sorted list of its elements.

For example, the in-order traversal of the above tree is

$1, 3, 4, 12, 13, 14, 15, 17, 18, 25, 25, 28, 29, 30$

A pre-order and post-order of a BST do not have any meaning.

## Operations on BSTs

• Operations on a search tree are usually more efficient than the same operations on a sorted linked list.

### Find

• Depending on how we define the data structure, the find method can return the key, the node that stores that key or a boolean value.

• Pseudocode:

find(root, x):
node = root
while node ≠ nil:
if x < node.key:
node = node.left
else if x > node.key
node = node.right
else:
return node
return nil

We can also easily modify this code so that it returns the largest key $\leq x$ or the smallest key $\geq x$ in the tree.

### Minimum and Maximum

• minimum and maximum are procedures that allow us to find the smallest key and the largest key stored in the BST. These can be easily implemented in a BST since the node with the smallest key is the leftmost node in the tree and the node with the largest key is the rightmost node in the tree.

• Depending on how we define the data structure, these methods can return the key or the node that stores that key.

• Pseudocode for minimum:

minimum(root, x):
node = root
while node.left ≠ nil:
node = node.left
return node
• Pseudocode for maximum:

minimum(root, x):
node = root
while node.right ≠ nil:
node = node.right
return node

## Insert

• The insert method inserts a new node into the tree.

• We start by following the same path through the tree as find. When we reaches a null pointer (missing node), replace that node with a new node storing the new key.

• Duplicate keys are allowed. If insert finds a node that already has the same key, we’ll put the new node in the right subtree of the older one. (We could just as easily choose the left subtree; it doesn’t matter.)

• Pseudocode:

insert(tree, newKey):
newNode = Node()
newNode.left = newNode.right = nil
newNode.key = newKey

nodeToCheck = tree.root
parentOfNewNode = nil
while nodeToCheck ≠ nil:
parentOfNewNode = nodeToCheck
if newNode.key < nodeToCheck.key:
nodeToCheck = nodeToCheck.left
else:
nodeToCheck = nodeToCheck.right
newNode.parent = parentOfNewNode
if parentOfNewNode == nil:
// tree was empty
tree.root = newNode
else if newNode.key < parentOfNewNode.key:
parentOfNewNode.left = newNode
else:
parentOfNewNode.right = newNode

## Remove

• The remove method removes a node with the specified key from the tree. If there are multiple nodes with that key, we remove only one node (the first one that we happen to find).

• We start by following the same path through the tree as find. If the key is not in the tree, we are done. Otherwise, let n be the first node with that key.

• If n has no children, we easily detach it from its parent.

• If n has one child, move n's child up to take n's place. n's child becomes the child of n's parent and n's parent becomes the parent of n's child.

• If n has two children, however, we have to be a bit more clever. Let successorNode be the node in n's right subtree with the smallest key. We detach successorNode from its parent and replace replace n with successorNode in the tree. Since successorNode's key is the next smallest key after the key we wanted to remove, the binary-search-tree property still holds.

• In order to move subtrees around within the binary search tree, we’ll define a subroutine transplant, which replaces one subtree as a child of its parent with another subtree:

transplant(tree, oldChild, newChild):
if oldChild.parent == nil:
tree.root = newChild
else if oldChild == oldChild.parent.left:
oldChild.parent.left = newChild
else:
oldChild.parent.right = newChild
if newChild ≠ nil:
newChild.parent = oldChild.parent

We can then write this pseudocode for remove:

remove(tree, key):
nodeToRemove = find(key)
if nodeToRemove ≠ nil:
if nodeToRemove.left == nil:
transplant(tree, nodeToRemove, nodeToRemove.right)
else if nodeToRemove.right == nil:
transplant(tree, nodeToRemove, nodeToRemove.left)
else:
successorNode = minimum(nodeToRemove.right)
if successorNode.parent ≠ nodeToRemove:
// detach successorNode from its parent
transplant(tree, successorNode, successorNode.right)
successorNode.right = nodeToRemove.right
successorNode.right.parent = successorNode
// replace nodeToRemove with successorNode
transplant(tree, nodeToRemove, successorNode)
successorNode.left = nodeToRemove.left
successorNode.left.parent = successorNode

## Successor and Predecessor

• The successor of a node $x$ is the node with the smallest key $< x.key$ if all keys are distinct and $\leq x.key$ if duplicate keys are allowed.

• We break the algorithm for finding the successor into two cases:

1. If the right subtree of $x$ is nonempty, then the successor of $x$ is the leftmost node in the right subtree of $x$.

2. If the right subtree of $x$ is empty and $x$ has a successor $y$, then $y$ is the lowest ancestor of $x$ whose left child is also an ancestor of $x$. To find $y$, we simply go up the tree from $x$ until we encounter a node that is the left child of its parent.

• Pseudocode:

successor(node):
if node.right ≠ nil:
return minimum(x.right)
else:
y = node.parent
while y ≠ nil and node == y.right
node = y
y = y.parent
return y
• The predecessor of a node $x$ is the node with the smallest key $> x.key$ if all keys are distinct and $\geq x.key$ if duplicate keys are allowed.

• We break the algorithm for finding the successor into two cases:

1. If the left subtree of $x$ is nonempty, then the predecessor of $x$ is the rightmost node in the left subtree of $x$..

2. If the left subtree of $x$ is empty and $x$ has a predecessor $y$, then $y$ is the lowest ancestor of $x$ whose right child is also an ancestor of $x$. To find $y$, we simply go up the tree from $x$ until we encounter a node that is the right child of its parent.

• The pseudocode for predecessor is analogous to that of successor.

• As you can see, the structure of a binary search tree allows us to determine the successor and the predecessor of a node without ever comparing keys.

## Complexity

• Running times of find, insert and remove are $O(h)$ where $h$ is the height of the tree.

• In the worst case, the height of the tree is $\Theta(n)$ where $n$ is the number of nodes in the tree.

For example, suppose we inserted a sorted sequence of numbers $1, 2, 3, 4, 5$ into the BST. The resulting tree will look like

1
\
2
\
3
\
4
\
5

Therefore, the worst case running time of find, insert and remove can also be expressed as $\Theta(n)$ where $n$ is the number of elements in the tree.

## Balanced BSTs

• In a perfectly balanced binary tree with height $h$, the number of nodes $n = 2^{h + 1} - 1$. In other words, no node has depth $d > \log_2 n$.

• Therefore, the running times of find, insert and remove are $O(\log n)$ where $n$ is the number of nodes in the tree.

• In a balanced BST, all the leaf nodes are at the same level.

• A balanced BST stays balanced when we insert new elements and remove the existing ones.

• Some of the popular data structures that implement self-balancing BSTs include:

• 2-3 trees

• AVL trees

• Red-black trees

Most STL implementations of the ordered associative containers (sets, multisets, maps and multimaps) use red-black trees.