Heaps

  • The (binary) heap data structure is an array that represents a nearly complete binary tree.

    Note

    The heap data structure is completely unrelated to the region of the computer’s memory of the same name.

    Note

    A binary tree is a tree where every node has at most two children.

    A complete binary tree is one where each node has 0 or two children and the total number of elements of a tree of height hh is 2h+112^{h + 1} - 1. Notice how every level is completely filled and all leaves are at the same level.

    A nearly complete binary tree is one where every level, except possibly the last, is completely filled. On the last level hh, it must have between 11 and 2h2^h nodes.

Array Representation

  • A binary heap is often represented as an array.

    array.png
    tree indices.png
  • Below are the properties of such representation.

    • The size of the array nn is the number of elements in the heap.

    • The root of the tree is the first element in the array.

    • The parent of the element at index ii is at index i12 \dfrac {i - 1}{2} .

    • The left child of the element at index ii is at index 2i+12i + 1.

    • The right child of the element at index ii is at index 2i+22i + 2.

    • The height of the tree with nn elements is log(n+1) \lceil \log (n + 1) \rceil .

Heap Property

  • There are two kinds of binary heaps: max-heaps and min-heaps. In both kinds, the values in the nodes must satisfy a heap property. For the remainder of this section, we will discuss max-heaps. Min-heaps are analogous.

  • In a max-heap, nodes must satisfy the max-heap property: every node must be greater than or equal to its children.

  • Below are some examples of valid max-heaps: all are nearly complete binary trees where the values in the nodes satisfy the max-heap property.

Heap Operations

maxHeapify

  • maxHeapify(i) corrects a single violation of the heap property in a subtree with root at i. This procedure assumes the left and right children of the element at i are valid max-heaps.

  • To correct a violation at index i, we find largest, the largest of left(i) and right(i). Then we swap the elements at indices i and largest (A[i] and A[largest]) and call maxHeapify(largest) until the violation no longer exists.

  • Here is an example of correcting a violation at index 1 by calling maxHeapify(8).

  • Pseudocode for maxHeapify, where A is the array representing the heap:

    maxHeapify(i):
        l = left(i)
        r = right(i)
        if l < A.size and A[l] > A[i]:
            largest = l
        else:
            largest = i
        if r < A.size and A[r] > A[largest]:
            largest = r
        if largest ≠ i
            swap A[i] and A[largest]
            maxHeapify(largest)
  • Time complexity of maxHeapify is O(logn)O(\log n) where nn is the number of elements in the heap.

    A more precise time complexity of calling maxHeapify(i) is O(h)O(h) where hh is the height of the subtree with root at node i. If the number of elements in the subtree is nin_i, the time complexity is O(logni)O(\log n_i).

buildMaxHeap

  • buildMaxHeap() produces a max-heap from an unordered array.

  • We observe that the leaves of the heap are always valid max-heaps. So to build a max-heap, we need to correct violations on nodes that are not leaves. (If the array that we use to represent our heap is 0-indexed, these would nodes at indices between n22\dfrac{n - 2}{2} and 00 inclusive).

  • Here is an example of building a max heap from an array containing the elements {4, 1, 3, 2, 16, 9, 10, 14, 8, 7.

  • Pseudocode for buildMaxHeap, where A is the array representing the heap:

    buildMaxHeap():
        for i = (A.size - 2) / 2..≥0:
            maxHeapify(i)
  • A simple analysis of time complexity of buildMaxHeap on a heap with nn elements will lead to a complexity of O(nlogn)O(n \log n), since we make n2\dfrac{n}{2} calls to maxHeapify and each call takes O(logn)O(\log n) time.

    However, we can get a tighter bound by doing a more careful analysis. Observe that each call to maxHeapify requires O(h)O(h) time where hh is the height of the subtree at node ii. The number of nodes at height hh is n2h+1\dfrac{n}{2^h + 1}. Therefore, the running time is

    n=0logn(O(n)×n2n+1)=O(nh=0lognh2h)=O(nh=0h2h)=O(n) \begin{aligned} \sum ^{\log n}_{n = 0} \left( O(n) \times \dfrac{n}{2^n + 1} \right) &= O\left( n \sum ^{\log n}_{h=0} \dfrac{h}{2^h} \right) \\ &= O\left( n \sum ^{\infty}_{h=0} \dfrac{h}{2^h} \right) \\ &= O(n) \end{aligned}

    Since h=0h2h=2\sum \limits_{h=0}^{\infty} \dfrac{h}{2^h} = 2.

results matching ""

    No results matching ""