Sorted Containers

Three sorted containers are provided: SortedDict, SortedMultiDict and SortedSet. SortedDict is similar to the built-in Julia type Dict with the additional feature that the keys are stored in sorted order and can be efficiently iterated in this order. SortedDict is a subtype of AbstractDict. It is generally slower than Dict because looking up a key requires an O(log n) tree search rather than an expected O(1) hash-table lookup time of Dict. SortedDict is a parameterized type with three parameters, the key type K, the value type V, and the ordering type O. SortedSet has only keys; it is an alternative to the built-in Set container and is a subtype of AbstractSet. Internally, SortedSet is implemented as a SortedDict in which the value type is Nothing. Finally, SortedMultiDict is similar to SortedDict except that each key can be associated with multiple values. The key=>value pairs in a SortedMultiDict are stored according to the sorted order for keys, and key=>value pairs with the same key are stored in order of insertion.

The containers internally use a 2-3 tree, which is a kind of balanced tree and is described in data structure textbooks. Internally, one Vector is used to store key/data pairs (the leaves of the tree) while a second holds the tree structure.

The containers require two functions to compare keys: a less-than and equals function. With the default ordering argument, the comparison functions are isless(key1,key2) (true when key1 < key2) and isequal(key1,key2) (true when key1 == key2) where key1 and key2 are keys. More details are provided below.

Tokens for Sorted Containers

The sorted containers support an object for indexing called a token defined as a two-entry tuple and aliased as SortedDictToken, SortedMultiDictToken, or SortedSetToken. A token is the address of a single data item in the container and can be dereferenced in time O(1).

The first entry of a token tuple is the container as a whole, and the second refers to the particular item. The second part is called a semitoken. The type of the semitoken is IntSemiToken.

A restriction for the sorted containers is that IntSemiToken cannot used as the key-type. This is because ambiguity would result between the two subscripting calls sc[k] and sc[st] described below. In the rare scenario that a sorted container whose key-type is IntSemiToken is required, a workaround is to wrap the key inside another immutable structure.

The notion of token is similar to the concept of iterators used by C++ standard containers. Tokens can be explicitly advanced or regressed through the data in the sorted order; they are implicitly advanced or regressed via iteration defined below.

A token may take two special values: the before-start value and the past-end value. These values act as lower and upper bounds on the actual data. The before-start token can be advanced, while the past-end token can be regressed. A dereferencing operation on either leads to an error.

In the current implementation, semitokens are internally stored as integers. Users should regard these integers as opaque since future versions of the package may change the internal indexing scheme. In certain situations it may be more costly to operate on tokens than semitokens because the first entry of a token (i.e., the container) is not a bits-type. If code profiling indicates that statements using tokens are allocating memory, then it may be advisable to rewrite the application code using semitokens rather than tokens.

Complexity of Sorted Containers

In the list of functions below, the running time of the various operations is provided. In these running times, n denotes the number of items in the container, and c denotes the time needed to compare two keys.

Constructors for Sorted Containers

SortedDict constructors

DataStructures.SortedDictMethod
SortedDict{K,V,Ord}(o::Ord=Forward) where {K, V, Ord <: Ordering}
SortedDict{K,V,Ord}(o::Ord, kv) where {K, V, Ord <: Ordering}

Construct a SortedDict with key type K and value type V with o ordering from an iterable kv. The iterable should generate either Pair{K,V} or Tuple{K,V}. If omitted, then the SortedDict is initially empty. Time: O(cn log n) where n is the length of the iterable.

source
DataStructures.SortedDictMethod
SortedDict(o::Ord=Forward) where {Ord <: Ordering}
SortedDict{K,V}(o::Ord=Forward) where {K,V,Ord<:Ordering}

Construct an empty SortedDict with key type K and value type V with o ordering (default to forward ordering). If K and V are not specified as in the first form, then they are assumed to both be Any. Time: O(1)

Note that a key type of Any or any other abstract type will lead to slow performance, as the values are stored boxed (i.e., as pointers), and insertion will require a run-time lookup of the appropriate comparison function. It is recommended to always specify a concrete key type, or to use one of the constructors in which the key type is inferred.

source
DataStructures.SortedDictMethod
SortedDict(iter, o::Ord=Forward) where {Ord <: Ordering}
SortedDict(o::Ordering, iter)
SortedDict{K,V}(iter, o::Ordering=Forward) where {K,V}
SortedDict{K,V}(o::Ordering, iter) where {K,V}

Construct a SortedDict from an arbitrary iterable object of key=>value pairs or (key,value) tuples with order object o. The key type and value type are inferred from the given iterable in the first two forms. The first two forms copy the data three times, so it is more efficient to explicitly specify K and V as in the second two forms. Time: O(cn log n)

source
DataStructures.SortedDictMethod
SortedDict(ps::Pair...)
SortedDict(o::Ordering, ps::Pair...)
SortedDict{K,V}(ps::Pair...)
SortedDict{K,V}(o::Ordering, ps::Pair...) where {K,V}

Construct a SortedDict from the given key-value pairs. The key type and value type are inferred from the given key-value pairs in the first two forms. The ordering is assumed to be Forward ordering in the first and third form. The first two forms (where K and V are not specified but inferred) involves copying the data three times and so is less efficient than the second two forms. Time: O(cn log n)

source
DataStructures.SortedDictMethod
SortedDict{K,V}(::Val{true}, iterable) where {K, V}
SortedDict{K,V}(::Val{true}, iterable, ord::Ordering) where {K,V}

Construct a SortedDict from an iterable whose eltype is Tuple{K,V} or Pair{K,V} and that is already in sorted ordered. The first form assumes Forward ordering. No duplicate keys allowed. Time: O(cn).

source

SortedMultiDict constructors

DataStructures.SortedMultiDictMethod
SortedMultiDict{K,V,Ord}(o::Ord=Forward) where {K, V, Ord <: Ordering}
SortedMultiDict{K,V,Ord}(o::Ord, iterable) where {K, V, Ord <: Ordering}

Construct a sorted multidict in which type parameters are explicitly listed; ordering object is explicitly specified. Time: O(cn log n)

source
DataStructures.SortedMultiDictMethod
SortedMultiDict(o::Ord=Forward) where {Ord <: Ordering}
SortedMultiDict{K,V}(o::Ordering=Forward) where {K,V}

Construct an empty SortedMultiDict with key type K and value type V with o ordering (default to Forward ordering). If K and V are not specified as in the first form, then they are assumed to both be Any. Time: O(1).

Note that a key type of Any or any other abstract type will lead to slow performance, as the values are stored boxed (i.e., as pointers), and insertion will require a run-time lookup of the appropriate comparison function. It is recommended to always specify a concrete key type, or to use one of the constructors in which the key type is inferred.

source
DataStructures.SortedMultiDictMethod
SortedMultiDict(ps::Pair...)
SortedMultiDict(o, ps::Pair...)
SortedMultiDict{K,V}(ps::Pair...)
SortedMultiDict{K,V}(o, ps::Pair...)

Construct a SortedMultiDict from the given key-value pairs. The key type and value type are inferred from the given key-value pairs in the first two form. The ordering is assumed to be Forward ordering in the first and third forms. The first two forms involve copying the data three times to infer the types and so are less efficient than the third and fourth form where {K,V} are specified explicitly. Time: O(cn log n)

source
DataStructures.SortedMultiDictMethod
SortedMultiDict(iter, o::Ord=Forward) where {Ord <: Ordering}
SortedMultiDict(o::Ordering, iter)
SortedMultiDict{K,V}(iter, o::Ordering=Forward) where {K, V}
SortedMultiDict{K,V}(o::Ordering, iter) where {K, V}

Construct a SortedMultiDict from an arbitrary iterable object of key=>value pairs or (key,value) tuples with order object o. The key type and value type are inferred from the given iterable in the first two forms. The first two forms copy the data three times, so it is more efficient to explicitly specify K and V as in the second two forms. Time: O(cn log n)

source
DataStructures.SortedMultiDictMethod
SortedMultiDict{K,V}(::Val{true}, iterable) where {K,V}
SortedMultiDict{K,V}(::Val{true}, iterable, ord::Ord) where {K,V,Ord<:Ordering}

Construct a SortedMultiDict from an iterable whose eltype is Tuple{K,V} or Pair{K,V} and that is already in sorted ordered. The first form assumes Forward ordering. Duplicate keys allowed. Time: O(cn).

source

SortedSet constructors

DataStructures.SortedSetMethod
SortedSet{K,Ord}(o::Ord=Forward) where {K, Ord<:Ordering}
SortedSet{K,Ord}(o::Ord, iter) where {K, Ord<:Ordering}

Construct a SortedSet of eltype Kusing from elements produced by iterable iter (e.g., an array) and ordering object o. Running time: O(cn log n) where n is the length of iterable.

source
DataStructures.SortedSetMethod
SortedSet(o::Ord=Forward) where {Ord <: Ordering}
SortedSet{K}(o::Ord=Forward) where {K, Ord<:Ordering}

Construct an empty SortedSet with Forward ordering. The first form assumes element type of Any. Time: O(1).

Note that an element type of Any or any other abstract type will lead to slow performance.

source
DataStructures.SortedSetMethod
SortedSet(o::Ordering, iter)
SortedSet(iter, o::Ordering=Forward)
SortedSet{K}(o::Ordering, iter)
SortedSet{K}(iter, o::Ordering=Forward)

Construct a sorted set from an iterable iter using order o. In the first two forms, the element type is inferred from the iterable, which requires copying the data twice. Therefore, the second two forms (specifying K explicitly) are more efficient. Time: O(cn log n)

source
DataStructures.SortedSetMethod
SortedSet{K}(::Val{true}, iterable) where {K}
SortedSet{K}(::Val{true}, iterable, ord::Ord) where {K, Ord <: Ordering}

Construct a SortedSet from an iterable whose entries have type K and that is already in sorted ordered. No duplicates allowed. The first form assumes Forward ordering. Time: O(cn).

source
Base.getindexMethod
Base.getindex(m::SortedDict, st::IntSemiToken)
Base.getindex(m::SortedMultiDict, st::IntSemiToken)

Retrieve value portion of item from SortedDict or SortedMultiDict m indexed by st, a semitoken. Notation m[st] appearing in an expression is equivalent to deref_value(token::Token) where token=(m,st). It is a BoundsError if the token is invalid. Prepending with @inbounds may elide the correctness check and results in undefined behavior if the token is invalid. Time: O(1)

source
Base.setindex!Method
Base.setindex!(m::SortedDict, newvalue, st::IntSemiToken)
Base.setindex!(m::SortedMultiDict, newvalue, st::IntSemiToken)

Set the value portion of item from SortedDict or SortedMultiDict m indexed by st, a semitoken to newvalue. A BoundsError is thrown if the token is invalid. Prepending with @inbounds may elide the correctness check and results in undefined behavior if the token is invalid. Time: O(1)

source
Base.setindex!Method
Base.setindex!(sd::SortedDict, newvalue, k)

Assign or reassign the value associated with the key k to newvalue. Note that the key is also overwritten; this is not necessarily a no-op since the equivalence in the sort-order does not imply equality. See also push_return_semitoken!(sd::SortedDict, p::Pair). Time: O(c log n)

source
DataStructures.derefMethod
deref(token::Token)
deref((m,st))

Return the data item indexed by the token. If the container is a SortedSet, then this is a key in the set. If the container is a SortedDict or SortedMultiDict, then this is a key=>value pair. It is a BoundsError if the token is invalid or is the before-start or past-end token. Prepending with @inbounds may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a sorted container m and a semitoken st. Time: O(1)

source
DataStructures.deref_keyMethod
deref_key(token::Token)
deref_key((m,st))

Return the key portion of a data item (a key=>value pair) in a SortedDict or SortedMultiDict indexed by the token. It is a BoundsError if the token is invalid or is the before-start or past-end token. Prepending with @inbounds may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a container m and a semitoken st. Time: O(1)

source
DataStructures.deref_valueMethod
deref_value(token::Token)
deref_value((m,st))

Returns the value portion of a data item (a key=>value pair) in a SortedDict or SortedMultiDict indexed by the token. It is a BoundsError if the token is invalid or is the before-start or past-end token. Prepending with @inbounds may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a container m and a semitoken st. Time: O(1)

source
Base.firstindexMethod
Base.firstindex(m::SortedContainer)

Return the semitoken of the first entry of the container m, or the past-end semitoken if the container is empty. This function was called startof (now deprecated) in previous versions of the package. Time: O(log n)

source
Base.lastindexMethod
Base.lastindex(m::SortedContainer)

Return the semitoken of the last entry of the sorted container m, or the before-start semitoken if the container is empty. This function was called endof (now deprecated) in previous versions of the package. Time: O(log n)

source
DataStructures.token_firstindexMethod
token_firstindex(m::SortedContainer)

Return the token of the first entry of the sorted container m, or the past-end token if the container is empty. Time: O(log n)

source
DataStructures.token_lastindexMethod
token_lastindex(m::SortedContainer)

Return the token of the last entry of the sorted container m, or the before-start semitoken if the container is empty. Time: O(log n)

source
Base.firstMethod
Base.first(sc::SortedContainer)

Return the first item (a k=>v pair for SortedDict and SortedMultiDict or an element for SortedSet) in sc according to the sorted order in the container. It is a BoundsError to call this function on an empty container. Equivalent to deref(token_startindex(sc)). Time: O(log n)

source
Base.lastMethod
Base.last(sc::SortedContainer)

Return the last item (a k=>v pair for SortedDict and SortedMultiDict or a key for SortedSet) in sc according to the sorted order in the container. It is a BoundsError to call this function on an empty container. Equivalent to deref(token_lastindex(sc)). Time: O(log n)

source
DataStructures.pastendtokenMethod
pastendtoken(m::SortedContainer)

Return the token of the entry that is one past the end of the sorted container m. Time: O(1)

source
DataStructures.advanceMethod
advance(token::Token)
advance((m,st))

Return the semitoken of the item in a sorted container one after the given token. A BoundsError is thrown if the token is the past-end token. Prepending with @inbounds may elide the correctness check and will result in undefined behavior if the token is invalid or points to the past-end token. The second form creates the token in-place as a tuple of a container m and a semitoken st. Time: O(log n)

source
DataStructures.regressMethod
regress(token::Token)
regress((m,st))

Return the semitoken of the item in a sorted container one before the given token. A BoundsError is thrown if the token is the before-start token. Prepending with @inbounds may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start token. The second form creates the token in-place as a tuple of a container m and a semitoken st. Time: O(log n)

source
Base.:+Method
Base.+(t::Token, j::Integer)
Base.-(t::Token, j::Integer)

Return the token that is j positions ahead (if +) or behind (if -) of t. Here, t is a token for a sorted container and j is an integer. If j is negative, then + regresses while - advances. If the operation t+j or t-j reaches the before-start or past-end positions in the container, then the before-start/past-end tokens are returned (and there is no error). Time: O(j+log n), so this function is not optimized for long jumps.

source
Base.Sort.searchsortedfirstMethod
Base.searchsortedfirst(m::SortedContainer, k)

Return the semitoken of the first item in the sorted container m that is greater than or equal to k in the sort order. If there is no such item, then the past-end semitoken is returned. Time: O(c log n)

source
Base.Sort.searchsortedlastMethod
Base.searchsortedlast(m::SortedContainer, k)

Return the semitoken of the last item in the container that is less than or equal to k in sort order. If there is no such item, then the before-start semitoken is returned. Time: O(c log n)

source
DataStructures.searchsortedafterMethod
searchsortedafter(m::SortedContainer, k)

Return the semitoken of the first item in the container that is greater than k in the sort order. If there is no such item, then the past-end semitoken is returned. Time: O(c log n)

source
DataStructures.searchequalrangeMethod
DataStructures.searchequalrange(smd::SortedMultiDict, k)

Return two semitokens that correspond to the first and last items in the SortedMultiDict that have key exactly equal to k. If k is not found, then it returns (pastendsemitoken(smd), beforestartsemitoken(smd)). Time: O(c log n)

source
DataStructures.findkeyMethod
findkey(m::SortedSet, k)

Return the semitoken of the element k in sorted set m. If the element is not present in m, then the past-end semitoken is returned. Time: O(c log n)

source

Inserting & Deleting in Sorted Containers

Base.push!Method
Base.push!(ss::SortedSet, k)

Insert the element k into the sorted set ss. If the k is already present, this overwrites the old value. (This is not necessarily a no-op; see remarks about the customizing the sort order.) See also push_return_semitoken!(ss::SortedSet, k). The return value is ss. Time: O(c log n)

source
Base.push!Method
Base.push!(sd::SortedDict, p::Pair)

Insert key-value pair p, i.e., a k=>v pair, into sd. If the key k is already present, this overwrites the old value. The key is also overwritten (not necessarily a no-op, since sort-order equivalence may differ from equality). The return value is sd. See also push_return_semitoken!(sd::SortedDict, p::Pair). Time: O(c log n)

source
DataStructures.push_return_semitoken!Method
DataStructures.push_return_semitoken!(ss::SortedSet, k)

Insert the element k into the SortedSet sc. If k is already present, this overwrites the old value. (This is not necessarily a no-op; see remarks about the customizing the sort order.) Unlike push!, the return value is a pair whose first entry is boolean and indicates whether the insertion was new (i.e., the key was not previously present) and the second entry is the semitoken of the new entry. This function replaces the deprecated insert!. Time: O(c log n)

source
DataStructures.push_return_semitoken!Method
DataStructures.push_return_semitoken!(sd::SortedDict, p::Pair)

Insert pair p of the form k=>v into sd. If the key is already present in sd, this overwrites the old value. Note that the key is also overwritten, which is not necessarily a no-op because equivalence in the sort order does not necessarily imply equality. Unlike push!, the return value is a 2-tuple whose first entry is boolean and indicates whether the insertion was new (i.e., the key was not previously present) and whose second entry is the semitoken of the new entry. This function replaces the deprecated insert!(sd,k,v). Time: O(c log n)

source
DataStructures.push_return_semitoken!Method
DataStructures.push_return_semitoken!(smd::SortedMultiDict, pr::Pair)

Insert the key-value pair pr, i.e., k=>v, into smd. If k already appears as a key in smd, then k=>v is inserted in the rightmost position after existing items with key k. Unlike push!, the return value is a 2-tuple whose first entry is boolean always equal to true and whose second entry is the semitoken of the new entry. (The reason for returning a bool whose value is always true is for consistency with push_return_semitoken! for SortedDict and SortedSet.) This function replaces the deprecated insert!. Time: O(c log n)

source
Base.delete!Method
Base.delete!(token::Token)

Delete the item indexed by the token from a sorted container. A BoundsError is thrown if the token is invalid. Prepending with @inbounds may elide the correctness check and will result in undefined behavior if the token is invalid. Time: O(log n).

source
Base.delete!Method
Base.delete!(ss::SortedSet, k)

Delete element k from sc. After this operation is complete, a token addressing the deleted item is invalid. Returns sc. if k is not present, this operation is a no-op. Time: O(c log n)

source
Base.delete!Method
Base.delete!(sd::SortedDict, k)

Delete the item whose key is k in sd. After this operation is complete, any token addressing the deleted item is invalid. Returns sc. This is a no-op if k is not present in sd. Time: O(c log n)

source
Base.pop!Method
Base.pop!(ss::SortedSet, k)
Base.pop!(ss::SortedSet, k, default)

Delete the item with key k in ss and return the item that compares equal to k according to the sort order (which is not necessarily k, since equality in the sort-order does not necessarily imply hash-equality). If k is not found, return default, or throw a KeyError if default is not specified. Time: O(c log n)

source
Base.pop!Method
Base.pop!(sd::SortedDict, k)
Base.pop!(sd::SortedDict, k, default)

Delete the item with key k in sd and return the value that was associated with k. If k is not in sd return default, or throw a KeyError if default is not specified. Time: O(c log n)

source
Base.popfirst!Method
Base.popfirst!(ss::SortedSet)

Delete the item with first key in SortedSet ss and returns the key. This function was named pop! in a previous version of the package. A BoundsError results if ss is empty. Time: O(log n)

source
DataStructures.poplast!Method
poplast!(ss::SortedSet)

Delete the item with last key in SortedSet ss and returns the key. A BoundsError results if ss is empty. This function will be renamed Base.pop! in a future version of the package. Time: O(log n)

source

Iteration and Token Manipulation

DataStructures.compareMethod
compare(m::SortedContainer, s::IntSemiToken, t::IntSemiToken)

Determine the relative position according to the sort order of the data items indexed by tokens (m,s) and (m,t). Return:

  • -1if(m,s) precedes (m,t),
  • 0 if s == t
  • 1 if (m,s)succeeds (m,t).

The relative positions are determined from the tree topology without any key comparisons. Time: O(log n)

source
DataStructures.statusMethod
status(token::Token) 
status((m, st))

Determine the status of a token. Return values are:

  • 0 = invalid token
  • 1 = valid and points to live data
  • 2 = before-start token
  • 3 = past-end token

The second form creates the token in-place as a tuple of a sorted container m and a semitoken st. Time: O(1)

source
Base.iterateMethod
Base.iterate(iter::SortedContainerIterable)

with the following helper functions to construct a SortedContainerIterable:

inclusive(m::SortedContainer, st1, st2)
inclusive(m::SortedContainer, (st1, st2))
inclusive_key(m::SortedContainer, key1, key2)
inclusive_key(m::SortedContainer, (key1, key2))
exclusive(m::SortedContainer, st1, st2)
exclusive(m::SortedContainer, (st1, st2))
exclusive_key(m::SortedContainer, key1, key2)
exclusive_key(m::SortedContainer, (key1, key2))
Base.keys(b)
Base.values(b)
Base.pairs(b)
Base.eachindex(b)
tokens(kv)
semitokens(kv)
onlytokens(kv)
onlysemitokens(kv)
Base.Iterators.reverse(m)
(:)(a,b)

Iterate over a sorted container, typically within a for-loop, comprehension, or generator. Here, iter is an iterable object constructed from a sorted container. The possible iterable objects are constructed from the helper functions as follows:

A basic iterable object is either

  • an entire sorted container m,
  • inclusive(m, (st1, st2)) or equivalently inclusive(m, st1, st2),
  • inclusive_key(m, (k1, k2)) or equivalently inclusive_key(m, k1, k2)
  • a:b, where a and b are tokens addressing the same container
  • exclusive(m, (st1, st2)) or equivalently exclusive(m, st1, st2)
  • exclusive_key(m, (k1, k2)) or equivalently exclusive_key(m, k1, k2)

These extract ranges of consecutive items in the containers. In the inclusive and exclusive constructions, constructions, m is a container, st1 and st2 are semitokens. The inclusive range includes both endpoints st1 and st2. The inclusive iteration is empty if compare(m,st1,st2)<0. The exclusive range includes endpoint st1 but not st2. The exclusive iteration is empty if compare(m,st1,st2)<=0. In the exclusive iteration, it is acceptable if st2 is the past-end semitoken.

The range exclusive_key means all data items with keys between k1 up to but excluding items with key k2. For this range to be nonempty, k1<k2 must hold (in the sort order). The range inclusive_key means all data items with keys between k1 and k2 inclusive. For this range to be nonempty, k1<=k2 must hold.

A kv iterable object has the form

  • b, a basic iterable object
  • keys(b) where b is a basic object. Extract keys only (not applicable to SortedSet)
  • values(b) where b is a basic object. Extract values only (not applicable to SortedSet).
  • pairs(b) where b is a basic object. Extracts key=>value pairs (not applicable to SortedSet). This is the same as just specifying b and is provided only for compatibility with Base.pairs.

A tkv object has the form

  • kv, a kv iterable object
  • tokens(kv) where kv is a kv iterable object. Return 2-tuples of the form (t,w), where t is the token of the item and w is a key or value if kv is a keys or values iteration, or (t,k,v) if kv is a pairs iteration.
  • semitokens(kv) where kv is a kv iterable object. Return pairs of the form (st,w), where st is the token of the item and w is a key or value if kv is a keys or values iteration, or (st,k,v) if kv is a pairs iteration.
  • onlytokens(kv) where kv is a kv iterable object. Return only tokens of the data items but not the items themselves. The keys, values, or pairs modifiers described above are ignored.
  • onlysemitokens(kv) where kv is a kv iterable object. Return only semitokens of the data items but not the items themselves. The keys, values, or pairs modifiers described above are ignored.

Finally, a tkv iteration can be reversed by the Iterators.reverse function. The Iterators.reverse function may be nested in an arbitrary position with respect to the other operations described above. Two reverse operations cancel each other out. For example, Iterators.reverse(keys(Iterators.reverse(m))) is the same iteration as keys(m).

For compatibility with Base, there is also an eachindex function: eachindex(b) where the base object b a SortedDict is the same as keys(b) (to be compatible with Dict). On the other hand, eachindex(b) where the base object b is a SortedSet or SortedMultiDict is the same as onlysemitokens(b).

Colon notation a:b is equivalent to onlytokens(inclusive(a[1], a[2], b[2])), in other words, it yields an iteration that provides all the tokens of items in the sort order ranging from token a up to token b. It is required that a[1]===b[1] (i.e., a and b are tokens for the same container). Exclusive iteration using colon notation is obtained via a : b-1.

Examples:

   for (k,v) in sd
       <body>
   end

Here, sd is a SortedDict or SortedMultiDict. The variables (k,v) are set to consecutive key-value pairs. All items in the container are produced in order.

   for k in inclusive(ss, st1, st2)
       <body>
   end

Here, ss is a SortedSet, and st1, and st2 are semitokens indexing ss. The elements of the set between st1 and st2 inclusive are returned.

   for (t,k) in tokens(keys(exclusive_key(sd, key1, key2)))
      <body>
   end

Here, sd is a SortedDict or SortedMultiDict, key1 and key2 are keys indexing sd. In this case, t will be tokens of consecutive items, while k will be the corresponding keys. The returned keys lie between key1 and key2 excluding key2.

   for (t,k) in Iterators.reverse(tokens(keys(exclusive_key(sd, key1, key2))))
      <body>
   end

Same as above, except the iteration is in the reverse order.

Writing on the objects returned by values is not currently supported, e.g., the following map! statement is not implemented even though the analogous statement is available for Dict in Base.

    s = SortedDict(3=>4)
    map!(x -> x*2, values(s))

The workaround is an explicit loop:

    s = SortedDict(3=>4)
    for t in onlysemitokens(s)
        s[t] *= 2
    end

Running time for all iterations: O(c(s + log n)), where s is the number of steps from start to end of the iteration.

source
Base.inMethod
Base.in(k,m::SortedSet)

Return true iff element k is in sorted set m is a sorted set. Unlike the in function for Set, this routine will thrown an error if k is not convertible to eltype(m). Time: O(c log n)

source
Base.inMethod
Base.in(p::Pair, sd::SortedDict)

Return true if p is in sd. Here, p is a key=>value pair. Time: O(c log n + d) where d stands for the time to compare two values.

source
Base.inMethod
Base.in(p::Pair, smd::SortedMultiDict)

Return true if p is in smd. Here, p is a key=>value pair. In the The time is is O(c log n + dl) where d is the time to compare two values and l stands for the number of entries that have the key of the given pair. (So therefore this call is inefficient if the same key addresses a large number of values, and an alternative should be considered.)

source
Base.inMethod
Base.in(x, iter::SortedContainerIterable)

Returns true if x is in iter, where iter refers to any of the iterable objects described under Base.iterate(iter::SortedContainerIterable), and x is of the appropriate type. For all of the iterables except the five listed below, the algorithm used is a linear-time search. For example, the call:

(k=>v) in exclusive(sd, st1, st2)

where sd is a SortedDict, st1 and st2 are semitokens, k is a key, and v is a value, will loop over all entries in the dictionary between the two tokens and a compare for equality using isequal between the indexed item and k=>v.

The five exceptions are:

(k=>v) in sd
(k=>v) in smd
k in ss
k in keys(sd)
k in keys(smd)

Here, sd is a SortedDict, smd is a SortedMultiDict, and ss is a SortedSet.

These five invocations of in use the index structure of the sorted container and test equality based on the order object of the keys rather than isequal. Therefore, these five are all faster than linear-time looping. To force the use of isequal test on the keys rather than the order object (thus slowing the execution from logarithmic to linear time), replace the above five constructs with these:

(k=>v) in collect(sd)
(k=>v) in collect(smd)
k in collect(ss)
k in collect(keys(sd))
k in collect(keys(smd))
source

Misc. Functions

Base.isempty(m::SortedContainer)
Base.empty!(m::SortedContainer)
Base.empty(m::SortedContainer)
Base.length(m::SortedContainer)
Base.eltype(m::SortedContainer)
Base.keytype(m::SortedContainer)
Base.valtype(m::SortedContainer)
Base.eltype(m::SortedContainerIteration)
Base.keytype(m::SortedContainerIteration)
Base.valtype(m::SortedContainerIteration)

These functions from Base are all applicable to sorted containers with the obvious meaning. The eltype, keytype, and valtype functions may be applied either to the object m or its type. Note that keytype and valtype are applicable only to SortedDict and SortedMultiDict, or to pairs iterations over SortedDict or SortedMultiDict. Time: O(1)

DataStructures.ordtypeMethod
ordtype(sc::SortedSet)
ordtype(sc::SortedDict)
ordtype(sc::SortedMultiDict)

Return the order type for a sorted container. This function may also be applied to the type itself. Time: O(1)

source
Base.haskeyMethod
haskey(sc::SortedContainer, k)

Return true iff key k is present in sc. Equivalent to in(k,sc) for a SortedSet, or to in(k,keys(sc)) for a SortedDict or SortedMultiDict. Time: O(c log n)

source
Base.getMethod
Base.get(sd::SortedDict,k,default)
Base.get(default_f::Union{Function,Type}, sd::SortedDict, k)

Return the value associated with key k where sd is a SortedDict, or else returns default if k is not in sd. The second form obtains default as the return argument of the function/type-constructor default_f (with no arguments) when the key is not present. Time: O(c log n)

source
Base.get!Method
Base.get!(sd::SortedDict,k,default)
Base.get!(default_f::Union{Function,Type}, sd::SortedDict, k)

Return the value associated with key k where sd is a SortedDict, or else return default if k is not in sd, and in the latter case, inserts (k,default) into sd. The second form computes a default value by calling the function default_f (with no arguments) or the constructor of type default_f when the key is not present. Time: O(c log n)

source
Base.getkeyMethod
Base.getkey(sd::SortedDict,k,defaultk)

Return the key k where sd is a SortedDict, if k is in sd else it returns defaultk. If the container uses in its ordering an eq method different from isequal (e.g., case-insensitive ASCII strings illustrated below), then the return value is the actual key stored in the SortedDict that is equivalent to k according to the eq method, which might not be equal to k. Similarly, if the user performs an implicit conversion as part of the call (e.g., the container has keys that are floats, but the k argument to getkey is an Int), then the returned key is the actual stored key rather than k. Time: O(c log n)

source
Base.isequalMethod
Base.isequal(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K,Ord <: Ordering}
Base.issetequal(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K,Ord <: Ordering}

Check if two sorted sets are equal in the sense that they contain the same items. Note that isequal in this sense does not imply correspondence between semitokens for items in sc1 with those for sc2. Time: O(cn) where n is the size of the smaller container. If the two sorted sets have K, different Ord, or different order objects, then a fallback routine isequal(::AbstractSet, ::AbstractSet) is invoked.

source
Base.isequalMethod
Base.isequal(sd1::SortedDict{K,V,Ord}, sd2::SortedDict{K,V,Ord}) where {K, V, Ord <: Ordering}

Check if two SortedDicts are equal in the sense that they contain the same items; the keys are compared using the eq method, while the values are compared with the isequal function. Note that isequal in this sense does not imply correspondence between semitokens for items in sd1 with those for sd2. Time: O(cn). Note that if K, V, Ord, or the order objects of sd1 and sd2 are different, then a fallback routine Base.isequal(::AbstractDict, ::AbstractDict) is invoked. Time: O(cn)

source
Base.isequalMethod
Base.isequal(smd1::SortedMultiDict{K,V,Ord}, smd2::SortedMultiDict{K,V,Ord}) where {K, V, Ord <: Ordering}

Check if two SortedMultiDicts are equal in the sense that they contain the same items in the same order (that is, the same insertion order). They must have the same order object, else they compare unequal. The keys are compared using the eq method, while the values are compared with the isequal function. Note that isequal in this sense does not imply any correspondence between semitokens for items in smd1 with those for smd2. Time: O(cn)

source
DataStructures.packcopyMethod
copy(sc::SortedSet)
copy(sc::SortedDict)
copy(sc::SortedMultiDict)
packcopy(sc::SortedSet)
packcopy(sc::SortedDict)
packcopy(sc::SortedMultiDict)

Return a copy of sc, where sc is a sorted container, in which the data is packed. When deletions take place, the previously allocated memory is not returned. This function can be used to reclaim memory after many deletions. Time: O(cn)

Note that the semitokens valid for the original container are no longer valid for the copy because the indexing structure is rebuilt by these copies. If an exact copy is needed in which semitokens remain valid, use Base.deepcopy.

source
DataStructures.packdeepcopyMethod
packdeepcopy(sc::SortedSet)
packdeepcopy(sc::SortedDict)
packdeepcopy(sc::SorteMultiDict)

Return a packed copy of sc, where sc is a sorted container in which the keys and values are deep-copied. This function can be used to reclaim memory after many deletions. Time: O(cn)

source
Base.mergeMethod
Base.merge(sd::SortedDict{K,V,Ord}, d1::AbstractDict{K,V}...) where {K,V,Ord <: Ordering}

Merge one or more dicts into a single SortedDict and return the new SortedDict. Arguments d1 etc. must have the same key-value type as sd. In the case of keys duplicated among the arguments, the rightmost argument that owns the key gets its value stored. Time: O(cN log N), where N is the total size of all the arguments. If all the arguments are SortedDicts with the same key, value, and order object, then the time is O(cN).

source
Base.merge!Method
Base.merge!(sd::SortedDict{K,V,Ord}, d1::AbstractDict{K,V}...) where {K,V,Ord<:Ordering}

Merge one or more dicts d1, etc. into sd. These must all must have the same key-value types. In the case of keys duplicated among the arguments, the rightmost argument that owns the key gets its value stored. Time: O(cN log N), where N is the total size of all the arguments.

source
Base.mergeMethod
Base.merge(smd::SortedMultiDict, iter...)

Merge smd and one or more iterables and return the resulting new SortedMultiDict. The iterables must have the same key-value type as smd. Items with equal keys are stored with left-to-right ordering. Time: O(cN log N), where N is the total size of all the arguments. If all the arguments are SortedMultiDicts with the same key, value, and order object, then the time is O(cN).

source
Base.merge!Method
Base.merge!(smd::SortedMultiDict, iter...)

Merge one or more iterables iter, etc. into smd. These must all must have the same key-value types. Items with equal keys are stored with left-to-right ordering. Time: O(cN log N), where N is the total size of all the arguments.

source

Set operations

The SortedSet container supports the following set operations. Note that in the case of intersect, symdiff and setdiff, the two SortedSets should have the same key and ordering object. If they have different key or ordering types, no error message is produced; instead, the built-in default versions of these functions (that can be applied to Any iterables and that return arrays) are invoked.

Base.union!Method
Base.union!(ss::SortedSet, iterable...)

Insert each item among the second and following arguments (which must be iterable) into the SortedSet ss. The items must be convertible to the key-type of ss. Time: O(cN log N) where N is the total number of items in the iterable arguments.

source
Base.unionMethod
Base.union(ss::SortedSet, iterable...)

Compute and return the union of a sorted set and one or more iterables. They must have the same keytype. If they are all sorted sets with the same order object, then the required time is O(cn), where n is the total size. If not, then the fallback routine requires time O(cn log n).

source
Base.intersectMethod
Base.intersect(ss::SortedSet, others...)

Intersect SortedSets with other SortedSets or other iterables and return the intersection as a new SortedSet. Time: O(cn), where n is the total number of items in all the arguments if all the arguments are SortedSets of the same type and same order object. Otherwise, the time is O(cn log n)

source
Base.symdiffMethod
Base.symdiff(ss1::SortedSet, iterable)

Compute and return the symmetric difference of ss1 and iterable, i.e., a sorted set containing entries that are in one of ss1 or iterable but not both. Time: O(cn), where n is the total size of the two containers if both are sorted sets with the same key and order objects. Otherwise, the time is O(cn log n)

source
Base.setdiffMethod
Base.setdiff(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K, Ord<:Ordering}
Base.setdiff(ss1::SortedSet, others...)

Return the set difference, i.e., a sorted set containing entries in ss1 but not in ss2 or successive arguments. Time for the first form: O(cn) where n is the total size of both sets provided that they are both sorted sets of the same type and order object. The second form computes the set difference between ss1 and all the others, which are all iterables. The second form requires O(cn log n) time.

source
Base.setdiff!Method
Base.setdiff!(ss::SortedSet, iterable..)

Delete items in ss that appear in any of the iterables. The arguments after the first must be iterables each of whose entries must convertible to the key type of m1. Time: O(cm log n), where n is the size of ss and m is the total number of items in iterable.

source
Base.issubsetMethod
Base.issubset(iterable, ss::SortedSet)

Check whether each item of the first argument is an element of ss. The entries must be convertible to the key-type of ss. Time: O(cm log n), where n is the size of ss and m is the number of items in iterable. If both are sorted sets of the same keytype and order object and if m > n / log n, then an algorithm whose running time is O(c(m+n)) is used.

source

Ordering of keys

As mentioned earlier, the default ordering of keys uses isless and isequal functions. If the default ordering is used, it is a requirement of the container that isequal(a,b) is true if and only if !isless(a,b) and !isless(b,a) are both true. This relationship between isequal and isless holds for common built-in types, but it may not hold for all types, especially user-defined types. If it does not hold for a certain type, then a custom ordering argument must be defined as discussed in the next few paragraphs.

The name for the default ordering (i.e., using isless and isequal) is Forward. Note: this is the name of the ordering object; its type is ForwardOrdering. Another possible ordering object is Reverse, which reverses the usual sorted order. This name must be imported import Base.Reverse if it is used.

As an example of a custom ordering, suppose the keys are of type String, and the user wishes to order the keys ignoring case: APPLE, berry and Cherry would appear in that order, and APPLE and aPPlE would be indistinguishable in this ordering.

The simplest approach is to define an ordering object of the form Lt(my_isless), where Lt is a built-in type (see ordering.jl) and my_isless is the user's comparison function. In the above example, the ordering object would be:

Lt((x,y) -> isless(lowercase(x),lowercase(y)))

The ordering object is indicated in the above list of constructors in the o position (see above for constructor syntax).

This approach may suffer from a performance hit because higher performance may be possible if an equality method is also available as well as a less-than method. A more complicated but higher-performance method to implement a custom ordering is as follows. First, the user creates a singleton type that is a subtype of Ordering as follows:

struct CaseInsensitive <: Ordering
end

Next, the user defines a method named lt for less-than in this ordering:

lt(::CaseInsensitive, a, b) = isless(lowercase(a), lowercase(b))

The first argument to lt is an object of the CaseInsensitive type (there is only one such object since it is a singleton type). The container also needs an equal-to function; the default is:

eq(o::Ordering, a, b) = !lt(o, a, b) && !lt(o, b, a)

The user can also customize this function with a more efficient implementation. In the above example, an appropriate customization would be:

eq(::CaseInsensitive, a, b) = isequal(lowercase(a), lowercase(b))

Note: the user-defined eq and lt functions must be compatible in the sense that !lt(o, a, b) && !lt(o, b, a) if and only if eq(o, a, b).

Finally, the user specifies the unique element of CaseInsensitive, namely the object CaseInsensitive(), as the ordering object to the SortedDict, SortedMultiDict or SortedSet constructor.

For the above code to work, the module must make the following declarations, typically near the beginning:

import Base.Ordering
import Base.lt
import DataStructures.eq

Cautionary note on mutable keys

As with ordinary Dicts, keys for the sorted containers can be either mutable or immutable. In the case of mutable keys, it is important that the keys not be mutated once they are in the container else the indexing structure will be corrupted. (The same restriction applies to Dict.) For example, the following sequence of statements leaves sd in a corrupted state:

sd = SortedDict{Vector{Int},Int}()
k = [1,2,3]
sd[k] = 19
sd[[6,4]] = 12
k[1] = 7

Performance of Sorted Containers

The sorted containers are currently not optimized for cache performance.