Sorted Containers

Three sorted containers are provided: SortedDict, SortedMultiDict and SortedSet. SortedDict is similar to the built-in Julia type Dict with the additional feature that the keys are stored in sorted order and can be efficiently iterated in this order. SortedDict is a subtype of AbstractDict. It is generally slower than Dict because looking up a key requires an O(log n) tree search rather than an expected O(1) hash-table lookup time of Dict. SortedDict is a parameterized type with three parameters, the key type K, the value type V, and the ordering type O. SortedSet has only keys; it is an alternative to the built-in Set container and is a subtype of AbstractSet. Internally, SortedSet is implemented as a SortedDict in which the value type is Nothing. Finally, SortedMultiDict is similar to SortedDict except that each key can be associated with multiple values. The key=>value pairs in a SortedMultiDict are stored according to the sorted order for keys, and key=>value pairs with the same key are stored in order of insertion.

The containers internally use a 2-3 tree, which is a kind of balanced tree and is described in data structure textbooks. Internally, one Vector is used to store key/data pairs (the leaves of the tree) while a second holds the tree structure.

The containers require two functions to compare keys: a less-than and equals function. With the default ordering argument, the comparison functions are isless(key1,key2) (true when key1 < key2) and isequal(key1,key2) (true when key1 == key2) where key1 and key2 are keys. More details are provided below.

Tokens for Sorted Containers

The sorted containers support an object for indexing called a token defined as a two-entry tuple and aliased as SortedDictToken, SortedMultiDictToken, or SortedSetToken. A token is the address of a single data item in the container and can be dereferenced in time O(1).

The first entry of a token tuple is the container as a whole, and the second refers to the particular item. The second part is called a semitoken. The type of the semitoken is IntSemiToken.

A restriction for the sorted containers is that IntSemiToken cannot used as the key-type. This is because ambiguity would result between the two subscripting calls sc[k] and sc[st] described below. In the rare scenario that a sorted container whose key-type is IntSemiToken is required, a workaround is to wrap the key inside another immutable structure.

The notion of token is similar to the concept of iterators used by C++ standard containers. Tokens can be explicitly advanced or regressed through the data in the sorted order; they are implicitly advanced or regressed via iteration defined below.

A token may take two special values: the before-start value and the past-end value. These values act as lower and upper bounds on the actual data. The before-start token can be advanced, while the past-end token can be regressed. A dereferencing operation on either leads to an error.

In the current implementation, semitokens are internally stored as integers. Users should regard these integers as opaque since future versions of the package may change the internal indexing scheme. In certain situations it may be more costly to operate on tokens than semitokens because the first entry of a token (i.e., the container) is not a bits-type. If code profiling indicates that statements using tokens are allocating memory, then it may be advisable to rewrite the application code using semitokens rather than tokens.

Complexity of Sorted Containers

In the list of functions below, the running time of the various operations is provided. In these running times, n denotes the number of items in the container, and c denotes the time needed to compare two keys.

Constructors for Sorted Containers

`SortedDict` constructors

DataStructures.SortedDict — Method

SortedDict{K,V,Ord}(o::Ord=Forward) where {K, V, Ord <: Ordering}
SortedDict{K,V,Ord}(o::Ord, kv) where {K, V, Ord <: Ordering}

Construct a SortedDict with key type K and value type V with o ordering from an iterable kv. The iterable should generate either Pair{K,V} or Tuple{K,V}. If omitted, then the SortedDict is initially empty. Time: O(cn log n) where n is the length of the iterable.