One of the most neglected data structures in the CS repertoire is
the heap. Unfortunately, the jargon is overloaded, so "heap" can mean
two different things - but there is a connection, which I'll explain
in a little while.
In many programming languages, when you can dynamically create
objects in memory, the space from which you allocate the memory to
create them is called the heap. That is not what I'm talking about.
What I am talking about is an extremely simple but powerful structure
that is used for maintaining a collection of values from which you want
to be able to quickly remove the largest object quickly. This may sound
trivial, but this turns out to be an incredibly useful structure. You can use this basic structure to implement a very fast sorting algorithm; to maintain a prioritized set of values/tasks; to manage chunks of memory; etc. (The
prioritized set is the most common case in my experience, but that's probably because I've spent so much of my career working on distributed systems.)
When you're talking about data structures, one of the important things
you need to do in order to really understand how to choose the best one is to understand exactly what properties a data structure provides: what operations need to be fast; what operations you need to have, but don't care about the speed; how fast things need to be; how much memory you need to create
one, and so on.
In a heap, what we want is to be able to add an element
to the heap quickly, and to be able to remove the largest element quickly. We're not going to remove arbitrary elements of the heap: we only remove the largest. We can't query the heap - that is, we can't ask "Is X in the heap?" - all we can do is remove the largest value. And while providing these properties, we want to use as little memory as possible.
Now, I haven't yet defined what I mean by "quickly" in the above definition. There's a good reason for that: there are a set of tradeoffs
that produce different solutions. For now, we'll settle for one
version of "quickly": O(lg n) - that is, in the worst case, the amount of time it takes to either remove the largest value, or to add a new value, is
proportional to the logarithm of the number of values in the heap. We can
provide that kind of performance with a kind of heap called a binary heap.
A binary heap is basically a binary tree with three properties, called
the heap properties of the tree:
- For any node in the tree, the node is larger than
its children. - Every level of the tree except for the leaves is full.
- The leaf level of the tree is full from the leftmost leaf
to the last leaf. (In other words, the last level is filled
left to right.)
So, for example, this figure is a binary heap. It's got three levels. The first two levels are full; the third level is full on its left side. The next
insert point would be the next leftmost slot, the left child of 6.
We can write a heap in a textual form using parens: we write each node
as a triple: (x, y, z), where x is the value of the parent, and y and z are the parenthesized heaps that are the children of x. So the same heap that I showed above can be written in text form as:
(11,(10,(9,(8,,),(2,,)),(5,(3,,),(1,,))),(7,(6,,),(4,,))).
To make it easier to read, we can stretch it across a couple of lines:
(11, (10, (9, (8,,), (2,,)), (5, (3,,), (1,,))), (7, (6,,), (4,,)))
For the moment, we'll ignore just how this tree is represented. We'll get
to that later.
It should be obvious how to find the largest element - it's the root of
the binary heap. But once we remove it, then we don't have a heap: there's a hole at the top - so part of the operation of removing the maximum element
has to be re-validating the heap properties.
The way that we do this is called bubble-down What we do is take the last element in the heap - the rightmost leaf - and insert it as the root. Now there's no hole, but the heap isn't valid. So we look at the two children of the root, and take the larger one, and swap it with the root. Now the root is larger that both of its children, but the child-heap that now has a new root may be invalid. So we check it, and if it's not valid, we repeat
the bubble-down operation on that child-heap.
Let's look at that with an example. Take the heap from the
figure above. If we remove the largest element (11), and then
replace it with the last element of the heap, we get the following
invalid heap:
(1, (10, (9, (8,,), (2,,)), (5, (3,,),)) (7, (6,,), (4,,)))
Now, we look at the two children, 10 and 7. 10 is larger, so we swap
the 1 and the 10, which gives us:
(10, (1, (9, (8,,), (2,,)), (5, (3,,),)) (7, (6,,), (4,,)))
Now, we look at the modified subheap - the left child heap. Its two
children are 9 and 5, both larger that 1. 9 is larger, so we swap:
(10, (9, (1, (8,,), (2,,)), (5, (3,,),)) (7, (6,,), (4,,)))
Now we've modified the left-left child subheap, and it's invalid, so we
again swap for the larger child:
(10, (9, (8, (1,,), (2,,)), (5, (3,,),)) (7, (6,,), (4,,)))
Now our heap is valid again.
So how long did that take? We removed the root, and replaced it with
the last child. That's a couple of steps - not the dominant cost of the operation. Thenwe did the bubbledown - that's the main cost. So how many
steps are there in a bubbledown? In the worst case - like the example case above - you'll need to do a number of swaps equal to the depth of the tree. The depth of the tree is lg(n), where n is the number of values in the tree. So we do at most lg(n) swaps - so the cost of removing the largest element
is O(lg n).
What about adding an element? It's basically the inverse of removing the root; it's a bubble-up. We insert the value into the position
after the last leaf. Then we compare the leaf to its parent. If it's
larger than its parent, we swap the node with its parent. And we repeat going
up the tree.
Let's look at a quick example of insert. Say we wanted to insert 12
to our heap. We'd start by adding 12 as a leaf:
(10, (9, (8, (1,,), (2,,)), (5, (3,,), (12,,)) (7, (6,,), (4,,)))
It's larger than its parent, 5, so we swap:
(10, (9, (8, (1,,), (2,,)), (12, (3,,), (5,,)) (7, (6,,), (4,,)))
And so on - we'll swap two more times, giving us a new heap:
(12, (10, (8, (1,,), (2,,)), (9, (3,,), (5,,)) (7, (6,,), (4,,)))
So again, we're looking at a maximum of lg(n) swaps to
do an insert. So we're O(lg n) for insert.
I'm running out of writing time, so I'll have to leave a
description of how to represent a heap, and an example
implementation of a heap for this evening. But I'll leave you with one
quick example of how a heap can be used. Suppose I want to sort
a list of integers. We know that the best worst-case time for
sorting is O(n lg n). We can implement a very simple sort by inserting all
of the values into a heap, and then removing them in order. That's n inserts, each with maximum cost lg(n); and n removes, maximum cost lg(n). So it's
O(n lg n) to sort the list - and no O(n2) case like quicksort!
- Log in to post comments
Another nice use for a heap is a priority queue. If you invert the comparison (for example so that the integer 1 is larger than the integer 2 for the purpose of the heap) you have one suitable for graph traversal algorithms, for example.
Great timing on that one. Some of my friends in their Analysis of Algorithms class needed some clearer understanding of heaps and their final hasn't quite happened yet! Our CS program puts good emphasis on data structures and algorithm analysis and it's one thing about the program that I really appreciate.
If one separates the idea of 'priority' from 'item', using a bit of work, a person can use a structure known as a pair heap for handling queries of 'is object X in the heap', as well as 'adjust the priority of object X to Y' or 'what is the priority of X', which is useful in shortest-path and minimum spanning tree algorithms, among others. For the interested, an implementation in Python for the non-C based heapq in Python 2.3 is available on the internet by speaking the proper incantation to Google.
Your heap assumes immutable weights; would you continue to utilize the heap structure if the weights were mutable, or would you choose to utilize a different structure?
Seems that if you kept the heap structure as you've defined it, that for a mutable weights you'd have to reheapify on every extract, thus increasing your extraction time and killing your performance.
droz:
Heap-like extraction from a pool of modifiable values is a fundamentally different problem, and really requires a different solution. You can work out ways of making a heap work with modifiable entries, but unless modifications are very rare compared to insert/remove
operations, it's probably not going to give you the kind of performance that you'd like.
-Mark
One thing I haven't read anywhere but is, IMHO, interesting, is the use of a ternary heap. It works just like the binary heap.
With binary heap bubble-down, you need 1 comparison to find the larger node, and one comparison to see if it's larger than the root; with ternary heap, you need 2+1 comparisons.
However, the height is only ln(2)/ln(3)=0.63 of the height of a binary heap, so even with a 1.5 factor of added comparisons, it's faster. You also save 1-0.63 of the moves.
Of course it needs more complicated logic and arithmetic, but still, I wonder why the ternary heap is not used more often...?
milan_va: Do you mean a D-Heap? A binary heap can fairly easily be modified to allow for an arbitrary number (>= 2) of children per node.
There will be a price to pay in an increased number of comparisons though, as each time you bubble down you need to find the largest of the children of the element bubbling down.
milan_va: It looks to me like the tree with three children per node has twice as many comparisons per node, not 1.5 times as many (2 compares for each node vs 1), which would suggest 1.26 times as many comparisons. There might be some wrinkle about finding the largest child I'm missing, though.
On the other hand, you do reduce the number of swaps quite a bit, which could be important if swaps are much more expensive than compares, like a heap stored on disk. Similarly to a B-tree, it could be handy to reduce the number of disk reads/writes at the expense of doing more comparisons in memory.
As an aside, the first fun data structure I ever encountered was the "leftist heap", which is a (very) unbalanced variation on the binary heap designed to be fast to merge two heaps together.
Hank: Yes, D-Heap looks like it's the same thing. I just wonder why the 3-heap is not used more often since it's better than a binary heap in most ways.
Matthew: You need to perform one more comparison after you find the largest child, to see if it's actually larger than the root, so the ratio is only (2+1)/(1+1) not (2)/(1). MarkCC has said he is going to post some (pseudo)code and if he does, you'll see it there.
The cost of the d-heap, unlike the binary heap, is different for insertion and removal. On insertion, you have worst case of log(n), with the base of the logarithm the number of children per node (and I'm guessing average log(n)/2 steps), which is a speedup the bigger the number of children.
For removal, however, you have to do comparisons proportional to the number of children per node, and have to do log(n) nodes (plus or minus one in a balanced heap), so for a tree with three children per node, that'd be (2 * log_3 (n) + 1)/ log_2 (n) + 1), which should approach 1.26 as n gets large.
milan_va,
I see what you mean. My bad.
I just caught a link from Pharyngula to ERV to here today, let me just say that you can do as many posts of this nature as you like! As a slightly lazy computer science grad student, I could definitely use tips on relatively obscure data structures (I remember learning about heaps as a sophomore, but forgot about them until again learning about that other kind of heap which I think overwrote the pointer to the original heap information in my head).
Not enough people are aware of this classical algorithm for extracting the largest element of a heap: promote the largest child into the empty position at the root, and continue doing this until you reach the bottom of the heap. Then insert the last element of the heap into the space at the bottom.
This will do log n + O(1) comparisons on average and 2 log n comparisons in the worst case. Because the last element was already a leaf, it probably belongs at the bottom of the heap. The algorithm you gave will push it all the way back down to the bottom, doing two comparisons per level and 2 log n comparisons on average.
Which algorithm is better depends on your data and your heap. If comparisons are cheap and the data is stored inline, or heap is small, then the 2 log n algorithm tends to be faster. If comparisons are expensive or the heap points to the data then the log n + O(1) algorithm is almost twice as fast.
See my website for some (hopefully not buggy) code.
droz #4: Heaps can handle mutable weights easily. Each mutation costs O(log n), so it's sub-optimal if they mutate a lot. You just have to bubble the node up after you increase its priority or bubble it down after you decrease it. (Note: a lot of references, such as Knuth, call this "sifting up" and "sifting down" rather than "bubbling".)
When you have a mixture of insert/extract and modify-weight operations, the ideal implementation depends on the ratio. For very modify-heavy mixes, the optimal solution is use an unsorted list, which gives O(1) modify-weight and insert but O(n) extract. In the middle are a bunch of heap variants with the same O(log n) asymptotic costs but a variety of constant factors.
I'm not a CS student, got my Ph.D. in Applied Math instead. But I needed to infer the coarse to fine grid discretization from a collection of points and had to use heaps recently.
I used a 3D kdtree for nearest-neighbor calculations, then created a heap with the smallest nearest neighbor distance at the top. As I pushed the smallest NND point to the end, I recalculated NNDs for all the points that had used that point as their nearest neighbor, and bubbled them down.
O(N log N) all the way through -- it sorts 6.5 million points in 3 minutes. Cute exercise, felt like the CS data structures course I'd never had a chance to take :-).
Heaps are very cool, I've solved some very knotty problem quite elegantly with a good heap library.
But just a 'quick' note, while heap sort provides a more deterministic sort time, quicksort is still, across all cases, more efficient in terms of time and memory.
I like it! I've heard of a bubble sort, which sounds like it uses the same moves.
Charles, quicksort is only faster because noone uses double-storage heapsort.
See here:
http://www.aims.ac.za/~mackay/sorting/sorting.html
Using twice the space makes heapsort faster than quicksort with no n**2 degenerate case!
Nice article! I'd love to see a follow-up dealing with the "Sideways Heaps" that Knuth discusses in his latest "Ccmputer Musing" lecture (video at http://scpd.stanford.edu/scpd/students/Dam_ui/pages/ArchivedVideoList56…)
malpollyon
Extremely interesting, I need to look at that.
Comparison counts are not an adequate measure of real performance. On pretty much all modern architectures, references to memory that is in a fast cache will be at least twice as fast as references outside that cache. Quicksort does a great job of confining references to caches. The classical method of implementing heaps scatters references in a way that defeats caching. Maybe Mark's implementation will be cache-friendly. If not, heapsort is unlikely to outperform Quicksort (or a cache-friendly implementation of merge sort) on most architectures. And merge sort is stable, to boot.
jpl:
The article links to a time-comparison of standard heapsort vs standard quicksort in which heapsort outperforms quicksort.
"It turns out that much of the reason for Intel's large margin of victory are the /Qaxi /Qxi flags used to build the test. The inner loop for heapsort has an opportunity to perform conditional computation rather than an unpredictable conditional branch (a big no no on modern microprocessors) and these flags allow the Intel compiler to exploit this situation while the other compilers were unable to do this. The quicksort algorithm inner loop cannot be predicated, and the Intel compiler did not do it with mergesort (even though it is possible to do it.)"
So I still don't think it's clearcut that quicksort is better.
The link I was referring to was
http://users.aims.ac.za/~mackay/sorting/sorting.html
(from your earlier posting) which says
"I evaluated the performance of Fast Heapsort on random permutations. Performance was measured solely on the basis of the number of binary comparisons required. [Fast Heapsort does require extra bookkeeping, so the CPU comparison will come out differently.]"
The measurements in
http://www.azillionmonkeys.com/qed/sort.html
(cited there) note that
"Data is time in seconds taken to sort 10000 lists of varying size of about 3000 integers each."
3000 integers will almost certainly fit in fast cache,
so the effect I allude to won't be noticed. A more meaningful set of tests would compare performance on lists that keep doubling in size, so they eventually exceed the cache size of whatever machine they are run on. My point was that Quicksort's divide-and-conquer approach eventually reduces partitions to fit in whatever cache is available, whereas the classical heap exhibits very little locality of reference, so a cache size smaller than the data size doesn't help it much.
jpl:
You're right. I should have read more carefully.
There is a posting going around the Internet that use a binary heap to represent all numbers, along the lines of the Cantor diagonal argument. So the heep looks like this
0
/ \
0 1
/ \ / \
0 1 0 1
Reading each path of a branch (a depth first traversal) gives us the following numbers
000 = 0
001 = 1
010 = 2
011 = 3
To represent all numbers we need to extend each branch off into infinity. The arguement by the Internet posting is that this proves Cantor is wrong and shows that all numbers, even irrational numbers, can be placed in order and counted. The 1st path in the heep is the number 0, the 2nd path in the heep is the number 1, the 3rd path is 2, and the last path is the largest number. Since the heep branches off into infinity then the decimal portion of every number would have a path down this infinite heep -- pi, e, the square root of 2, and all other numbers.
The Internet arguement seems to rely on the fact that there "is" a 1st path, 2nd path, etc. However, if each path is infinite then I am not sure that is a valid assumption - although I am not sure that it is not either (infinity confuses me). In any event, I thought the arguement was interesting and figured I would pass it along.
Daithi:
That's a classic example of a misunderstanding of how things work.
The problem with that argument is that you can't ever construct any of the irrational numbers. For that construction to work, the construction of the heap must terminate and allow the values in the heap to be read off. But the construction can never terminate. Even after ω steps - that is, an infinite number of steps - you won't have the heap finished.
So the problem isn't really that it assumes that there is a first path, a second path, etc. It's deeper: it's that you can't ever finish constructing any path to an irrational number.
Actually, I understood that you can't construct an infinite heep or an irrational number within that heep, just as I can't actually write out the decimal expansion of of pi. As a matter of fact, I can't actually construct the rational numbers with this heep either. The number 1 is the 2nd branch of the infinite heep, and I can't actually follow an infinite number of branchings. However, I can visualize the concept of an infinite heep.
If I stop constructing the heep at any point then I could read off the numbers I have so far, just as if I were counting to infinity I could stop and read off the numbers I have so far.
Is it your position that an infinite heep is invalid, thus the Internet arguement is also invalid?
I read your post again and I think I understand why this doesn't work. You would have to construct the infinite heep before you could even start counting. Is that correct?
Daithi:
Yes, that's it exactly. For the infinite heap method of enumerating the reals including the irrationals to work, you'd need to finish constructing the heap before outputting anything.
For a classic rational implementation, you can emit rationals as you generate them. So for any rational that you wanted, you'd see it produced after a finite period of time. But you'd never emit an irrational. But for the construction of the reals, you'd never see anything. Even if you modified the process to emit rationals as it generated them, you'd wait forever - literally forever, not just "a really long time", before you'd see a single irrational number.
Thanks, Mark.
By the way, I just figured out that the infinite heap has the exact same characteristics as Cantor's Diagonal proof. That is, we can construct a number that does not exist within this heap.
Let's assume that an infinite heap exists. We could label each branch of the heap S0, S1, S2, ..., Sn. We could also label each node in the heap S0(0), S0(1), S0(2), ..., S1(0), S1(1), ..., Sn(n), Where the number in parentheses is the node or level of the path - starting at the bottom of the heap and counting upwards.
We can construct a number not in the heap by examining S0(0) and making its inverse the first number in our new number (i.e. if it is a 0 we start our new number with 1 and visa-versa), then we examine S1(1) and make its inverse our second number, and so on. This would make it impossible for us to find our newly constructed number in path S0, S1, S2, or in any of the paths S0 through Sn. Therefore we have a number that is not in our heap, but then again our heap contains all numbers. This is the same thing that Cantor's Diagonal argument showed.
So it's O(n lg n) to sort the list - and no O(n2) case like quicksort!
O(n2) quicksort is obsoleted by introsort, which uses heapsort rather than the traditional insertion sort for small subsets, and is mandated by the C++ STL standard; this "hybrid" algorithm outperforms heapsort and avoids quicksort's worst-case performance (which is unlikely with a properly engineered quicksort on real-life data, but is subject to exploit by DoS (denial of service) attacks). Note that there is a fast worst-case-linear selection algorithm (find the nth smallest element of a set -- a generalization of the priority problem) based on introsort. It's important for comp sci students to know about these algorithms, but professionals use templated code libraries such as STL that have been thoroughly engineered for performance and correctness.
The arguement by the Internet posting is that this proves Cantor is wrong and shows that all numbers, even irrational numbers, can be placed in order and counted.
This "argument" is exactly equivalent to "I can imagine all the (binary representations of) the reals in order and countable, therefore that can be done", since this tree is simply a representation of an ordered list with prefixes coalesced.
Don't forget that Heaps, and by extension, priority queues are fundamental to many Graph algorithms (ie PFS, shortest path, etc)
Agree with ianam. If you don't know about introsort look it up, it's a great algo. It's faster than heapsort and almost as fast as quicksort. This does not make trees obsolete however. They have their uses. Great article Marc.