Sorted Containers

Three sorted containers are provided: SortedDict, SortedMultiDict and SortedSet. SortedDict is similar to the built-in Julia type Dict with the additional feature that the keys are stored in sorted order and can be efficiently iterated in this order. SortedDict is a subtype of Associative. It is generally slower than Dict because looking up a key requires an O(log n) tree search rather than an expected O(1) hash-table lookup time as with Dict. SortedDict is a parametrized type with three parameters, the key type K, the value type V, and the ordering type O. SortedSet has only keys; it is an alternative to the built-in Set container. Internally, SortedSet is implemented as a SortedDict in which the value type is Void. Finally, SortedMultiDict is similar to SortedDict except that each key can be associated with multiple values. The key=>value pairs in a SortedMultiDict are stored according to the sorted order for keys, and key=>value pairs with the same key are stored in order of insertion.

The containers internally use a 2-3 tree, which is a kind of balanced tree and is described in many elementary data structure textbooks.

The containers require two functions to compare keys: a less-than and equals function. With the default ordering argument, the comparison functions are isless(key1,key2) (true when key1 < key2) and isequal(key1,key2) (true when key1 == key2) where key1 and key2 are keys. More details are provided below.

Tokens for Sorted Containers

The sorted container objects use a special type for indexing called a token defined as a two-entry tuple and aliased as SDToken, SMDToken, and SetToken for SortedDict, SortedMultiDict and SortedSet respectively. A token is the address of a single data item in the container and can be dereferenced in time O(1).

The first entry of a Token tuple is the container as a whole, and the second refers to the particular item. The second part is called a semitoken. The types for a semitoken are SDSemiToken, SMDSemiToken, and SetSemiToken for the three types of containers SortedDict, SortedMultiDict and SortedSet. These types are all aliases of IntSemiToken.

A restriction for the sorted containers is that IntSemiToken or its aliases cannot used as the key-type. This is because ambiguity would result between the two subscripting calls sc[k] and sc[st] described below. In the rare scenario that a sorted container whose key-type is IntSemiToken is required, a workaround is to wrap the key inside another immutable structure.

In the current version of Julia, it is costly to operate on tuples whose entries are not bits-types because such tuples are allocated on the heap. For example, the first entry of a token is a pointer to a container (a non-bits type), so a new token is allocated on the heap rather than the stack. In order to avoid performance loss, the package uses tokens less frequently than semitokens. For a function taking a token as an argument like deref described below, if it is invoked by explicitly naming the token like this:

tok = (sc,st)   # sc is a sorted container, st is a semitoken
k,v = deref(tok)

then there may be a loss of performance compared to:

k,v = deref((sc,st))

because the former needs an extra heap allocation step for tok.

The notion of token is similar to the concept of iterators used by C++ standard containers. Tokens can be explicitly advanced or regressed through the data in the sorted order; they are implicitly advanced or regressed via iteration loops defined below.

A token may take two special values: the before-start value and the past-end value. These values act as lower and upper bounds on the actual data. The before-start token can be advanced, while the past-end token can be regressed. A dereferencing operation on either leads to an error.

In the current implementation, semitokens are internally stored as integers. However, for the purpose of future compatibility, the user should not extract this internal representation; these integers do not have a documented interpretation in terms of the container.

Constructors for Sorted Containers

`SortedDict` constructors

DataStructures.SortedDict — Method.

SortedDict(o=Forward)

Construct an empty SortedDict with key type K and value type V. If K and V are not specified, the dictionary defaults to a SortedDict{Any,Any}. Keys and values are converted to the given type upon insertion. Ordering o defaults to Forward ordering.

Note that a key type of Any or any other abstract type will lead to slow performance, as the values are stored boxed (i.e., as pointers), and insertion will require a run-time lookup of the appropriate comparison function. It is recommended to always specify a concrete key type, or to use one of the constructors below in which the key type is inferred.