Heaps

Heaps are data structures that efficiently maintain the minimum (or maximum) for a set of data that may dynamically change.

All heaps in this package are derived from AbstractHeap, and provide the following interface:

# Let `h` be a heap, `v` be a value, and `n` be an integer size

length(h)            # returns the number of elements

isempty(h)           # returns whether the heap is empty

push!(h, v)          # add a value to the heap

first(h)             # return the first (top) value of a heap

pop!(h)              # removes the first (top) value, and returns it

extract_all!(h)      # removes all elements and returns sorted array

extract_all_rev!(h)  # removes all elements and returns reverse sorted array

sizehint!(h, n)      # reserve capacity for at least `n` elements

Mutable heaps (values can be changed after being pushed to a heap) are derived from AbstractMutableHeap <: AbstractHeap, and additionally provides the following interface:

# Let `h` be a heap, `i` be a handle, and `v` be a value.

i = push!(h, v)            # adds a value to the heap and and returns a handle to v

update!(h, i, v)           # updates the value of an element (referred to by the handle i)

delete!(h, i)              # deletes the node with handle i from the heap

v, i = top_with_handle(h)  # returns the top value of a heap and its handle

Currently, both min/max versions of binary heap (type BinaryHeap) and mutable binary heap (type MutableBinaryHeap) have been implemented.

Examples of constructing a heap:

h = BinaryMinHeap{Int}()
h = BinaryMaxHeap{Int}()             # create an empty min/max binary heap of integers

h = BinaryMinHeap([1,4,3,2])
h = BinaryMaxHeap([1,4,3,2])         # create a min/max heap from a vector

h = MutableBinaryMinHeap{Int}()
h = MutableBinaryMaxHeap{Int}()      # create an empty mutable min/max heap

h = MutableBinaryMinHeap([1,4,3,2])
h = MutableBinaryMaxHeap([1,4,3,2])  # create a mutable min/max heap from a vector

Using alternate orderings

Heaps can also use alternate orderings apart from the default one defined by Base.isless. This is accomplished by passing an instance of Base.Ordering as the first argument to the constructor. The top of the heap will then be the element that comes first according to this ordering.

The following example uses 2-tuples to track the index of each element in the original array, but sorts only by the data value:

data = collect(enumerate(["foo", "bar", "baz"]))

h1 = BinaryHeap(data) # Standard lexicographic ordering for tuples
first(h1)             # => (1, "foo")

h2 = BinaryHeap(Base.By(last), data) # Order by 2nd element only
first(h2)                            # => (2, "bar")

If the ordering type is a singleton it can be passed as a type parameter to the constructor instead:

BinaryHeap{T, O}()        # => BinaryHeap{T}(O())
MutableBinaryHeap{T, O}() # => MutableBinaryHeap{T}(O())

Min-max heaps

Min-max heaps maintain the minimum and the maximum of a set, allowing both to be retrieved in constant (O(1)) time. The min-max heaps in this package are subtypes of AbstractMinMaxHeap <: AbstractHeap and have the same interface as other heaps with the following additions:

# Let h be a min-max heap, k an integer
minimum(h)     # return the smallest element
maximum(h)     # return the largest element

popmin!(h)     # remove and return the smallest element
popmin!(h, k)  # remove and return the smallest k elements

popmax!(h)     # remove and return the largest element
popmax!(h, k)  # remove and return the largest k elements

popall!(h)     # remove and return all the elements, sorted smallest to largest
popall!(h, o)  # remove and return all the elements according to ordering o

The usual first(h) and pop!(h) are defined to be minimum(h) and popmin!(h), respectively.

This package includes an implementation of a binary min-max heap (BinaryMinMaxHeap).

Atkinson, M.D., Sack, J., Santoro, N., & Strothotte, T. (1986). Min-Max > Heaps and Generalized Priority Queues. Commun. ACM, 29, 996-1000. doi: 10.1145/6617.6621

Examples:

h = BinaryMinMaxHeap{Int}()        # create an empty min-max heap with integer values

h = BinaryMinMaxHeap([1, 2, 3, 4]) # create a min-max heap from a vector

Functions using heaps

Heaps can be used to extract the largest or smallest elements of an array without sorting the entire array first:

data = [0,21,-12,68,-25,14]
nlargest(3, data)  # => [68,21,14]
nsmallest(3, data) # => [-25,-12,0]

Both methods also support the by and lt keywords to customize the sort order, as in Base.sort:

nlargest(3, data, by=x -> x^2)  # => [68,-25,21]
nsmallest(3, data, by=x -> x^2) # => [0,-12,14]

The lower-level DataStructures.nextreme function takes a Base.Ordering instance as the first argument and returns the first n elements according to this ordering:

DataStructures.nextreme(Base.Forward, n, a) # Equivalent to nsmallest(n, a)

Improving performance with Float data

One use case for custom orderings is to achieve faster performance with Float elements with the risk of random ordering if any elements are NaN. The provided DataStructures.FasterForward and DataStructures.FasterReverse orderings are optimized for this purpose and may achieve a 2x performance boost:

h = BinaryHeap{Float64, DataStructures.FasterForward}() # faster min heap
h = BinaryHeap{Float64, DataStructures.FasterReverse}() # faster max heap

h = MutableBinaryHeap{Float64, DataStructures.FasterForward}() # faster mutable min heap
h = MutableBinaryHeap{Float64, DataStructures.FasterReverse}() # faster mutable max heap

DataStructures.nextreme(DataStructures.FasterReverse(), n, a)  # faster nlargest(n, a)
DataStructures.nextreme(DataStructures.FasterForward(), n, a)  # faster nsmallest(n, a)