Binary Search Trees

  • An ordered dictionary ADT is a dictionary in which the keys have a total order, just like in a heap. You can insert, find, and remove elements, just as with a hash table. But unlike a hash table, you can also quickly find the element with minimum or maximum key, or the entry nearest another entry in the total order. An ordered dictionary does anything a dictionary or binary heap can do and more, albeit more slowly.

  • A binary search tree (BST) is a simple way to implement an ordered dictionary.

  • In fact, the STL (multi)set and (multi)map use a variation of a BST (a red-black tree) as their internal data structure.

Representing Binary Trees

  • Recall that a binary tree is a rooted tree wherein no node has more than two children. Additionally, every child is either a left child or a right child of its parent, even if it is its parent’s only child. In the most popular binary tree representation, each tree node has three pointers: a parent pointer, a left pointer and a right pointer. Each node also stores a key (its value).

  • In a binary search tree, elements are maintained in a (somewhat) sorted order.

  • A binary search tree satisfies the binary-search-tree property (also called the binary-search-tree invariant):

    Let xx be a node in a binary search tree. If yy is a node in the left subtree of xx, then y.keyx.keyy.key \leq x.key. If yy is a node in the right subtree of xx, then y.keyx.keyy.key \geq x.key.

  • When a node has only one child, that child is either a left child or a right child, depending on whether its key is smaller or larger than its parent’s key. A key equal to the parent’s key can go into either subtree.

  • Below is an example of a binary search tree.

           18
          /  \
        12    25
       / \    / \
      4  15  25  30
     /  /  \    /
    1  13  17  28
     \  \       \
      3  14      29

    You can verify this is a binary search tree: for instance, the root is 18, its left subtree (rooted at 12) contains numbers from 1 to 17, and its right subtree (rooted at 25) contains numbers from 25 to 30.

  • An in-order traversal of a binary search tree visits the nodes in sorted order. In this sense, a search tree maintains a sorted list of its elements.

    For example, the in-order traversal of the above tree is

    1,3,4,12,13,14,15,17,18,25,25,28,29,301, 3, 4, 12, 13, 14, 15, 17, 18, 25, 25, 28, 29, 30

    A pre-order and post-order of a BST do not have any meaning.

Operations on BSTs

  • Operations on a search tree are usually more efficient than the same operations on a sorted linked list.

Find

  • Depending on how we define the data structure, the find method can return the key, the node that stores that key or a boolean value.

  • Pseudocode:

    find(root, x):
        node = root
        while node ≠ nil:
            if x < node.key:
                node = node.left
            else if x > node.key
                node = node.right
            else:
                return node
        return nil

    We can also easily modify this code so that it returns the largest key x\leq x or the smallest key x\geq x in the tree.

Minimum and Maximum

  • minimum and maximum are procedures that allow us to find the smallest key and the largest key stored in the BST. These can be easily implemented in a BST since the node with the smallest key is the leftmost node in the tree and the node with the largest key is the rightmost node in the tree.

  • Depending on how we define the data structure, these methods can return the key or the node that stores that key.

  • Pseudocode for minimum:

    minimum(root, x):
        node = root
        while node.left ≠ nil:
            node = node.left
        return node
  • Pseudocode for maximum:

    minimum(root, x):
        node = root
        while node.right ≠ nil:
            node = node.right
        return node

Insert

  • The insert method inserts a new node into the tree.

  • We start by following the same path through the tree as find. When we reaches a null pointer (missing node), replace that node with a new node storing the new key.

  • Duplicate keys are allowed. If insert finds a node that already has the same key, we’ll put the new node in the right subtree of the older one. (We could just as easily choose the left subtree; it doesn’t matter.)

  • Pseudocode:

    insert(tree, newKey):
        newNode = Node()
        newNode.left = newNode.right = nil
        newNode.key = newKey
    
        nodeToCheck = tree.root
        parentOfNewNode = nil
        while nodeToCheck ≠ nil:
            parentOfNewNode = nodeToCheck
            if newNode.key < nodeToCheck.key:
                nodeToCheck = nodeToCheck.left
            else:
                nodeToCheck = nodeToCheck.right
        newNode.parent = parentOfNewNode
        if parentOfNewNode == nil:
            // tree was empty
            tree.root = newNode
        else if newNode.key < parentOfNewNode.key:
            parentOfNewNode.left = newNode
        else:
            parentOfNewNode.right = newNode

Remove

  • The remove method removes a node with the specified key from the tree. If there are multiple nodes with that key, we remove only one node (the first one that we happen to find).

  • We start by following the same path through the tree as find. If the key is not in the tree, we are done. Otherwise, let n be the first node with that key.

    • If n has no children, we easily detach it from its parent.

    • If n has one child, move n's child up to take n's place. n's child becomes the child of n's parent and n's parent becomes the parent of n's child.

  • If n has two children, however, we have to be a bit more clever. Let successorNode be the node in n's right subtree with the smallest key. We detach successorNode from its parent and replace replace n with successorNode in the tree. Since successorNode's key is the next smallest key after the key we wanted to remove, the binary-search-tree property still holds.

  • In order to move subtrees around within the binary search tree, we’ll define a subroutine transplant, which replaces one subtree as a child of its parent with another subtree:

    transplant(tree, oldChild, newChild):
        if oldChild.parent == nil:
            tree.root = newChild
        else if oldChild == oldChild.parent.left:
            oldChild.parent.left = newChild
        else:
            oldChild.parent.right = newChild
        if newChild ≠ nil:
            newChild.parent = oldChild.parent

    We can then write this pseudocode for remove:

    remove(tree, key):
        nodeToRemove = find(key)
        if nodeToRemove ≠ nil:
            if nodeToRemove.left == nil:
                transplant(tree, nodeToRemove, nodeToRemove.right)
            else if nodeToRemove.right == nil:
                transplant(tree, nodeToRemove, nodeToRemove.left)
            else:
                successorNode = minimum(nodeToRemove.right)
                if successorNode.parent ≠ nodeToRemove:
                    // detach successorNode from its parent
                    transplant(tree, successorNode, successorNode.right)
                    successorNode.right = nodeToRemove.right
                    successorNode.right.parent = successorNode
                // replace nodeToRemove with successorNode
                transplant(tree, nodeToRemove, successorNode)
                successorNode.left = nodeToRemove.left
                successorNode.left.parent = successorNode

Successor and Predecessor

  • The successor of a node xx is the node with the smallest key <x.key< x.key if all keys are distinct and x.key\leq x.key if duplicate keys are allowed.

    • We break the algorithm for finding the successor into two cases:

      1. If the right subtree of xx is nonempty, then the successor of xx is the leftmost node in the right subtree of xx.

      2. If the right subtree of xx is empty and xx has a successor yy, then yy is the lowest ancestor of xx whose left child is also an ancestor of xx. To find yy, we simply go up the tree from xx until we encounter a node that is the left child of its parent.

    • Pseudocode:

      successor(node):
          if node.right ≠ nil:
              return minimum(x.right)
          else:
              y = node.parent
              while y ≠ nil and node == y.right
                  node = y
                  y = y.parent
              return y
  • The predecessor of a node xx is the node with the smallest key >x.key> x.key if all keys are distinct and x.key\geq x.key if duplicate keys are allowed.

    • We break the algorithm for finding the successor into two cases:

      1. If the left subtree of xx is nonempty, then the predecessor of xx is the rightmost node in the left subtree of xx..

      2. If the left subtree of xx is empty and xx has a predecessor yy, then yy is the lowest ancestor of xx whose right child is also an ancestor of xx. To find yy, we simply go up the tree from xx until we encounter a node that is the right child of its parent.

    • The pseudocode for predecessor is analogous to that of successor.

  • As you can see, the structure of a binary search tree allows us to determine the successor and the predecessor of a node without ever comparing keys.

Complexity

  • Running times of find, insert and remove are O(h)O(h) where hh is the height of the tree.

  • In the worst case, the height of the tree is Θ(n)\Theta(n) where nn is the number of nodes in the tree.

    For example, suppose we inserted a sorted sequence of numbers 1,2,3,4,51, 2, 3, 4, 5 into the BST. The resulting tree will look like

    1
     \
      2
       \
        3
         \
          4
           \
            5

    Therefore, the worst case running time of find, insert and remove can also be expressed as Θ(n)\Theta(n) where nn is the number of elements in the tree.

Balanced BSTs

  • In a perfectly balanced binary tree with height hh, the number of nodes n=2h+11n = 2^{h + 1} - 1. In other words, no node has depth d>log2nd > \log_2 n.

  • Therefore, the running times of find, insert and remove are O(logn)O(\log n) where nn is the number of nodes in the tree.

  • In a balanced BST, all the leaf nodes are at the same level.

  • A balanced BST stays balanced when we insert new elements and remove the existing ones.

  • Some of the popular data structures that implement self-balancing BSTs include:

    • 2-3 trees

    • AVL trees

    • Red-black trees

    Most STL implementations of the ordered associative containers (sets, multisets, maps and multimaps) use red-black trees.

results matching ""

    No results matching ""