# Sorted Containers

Three sorted containers are provided: SortedDict, SortedMultiDict and SortedSet. *SortedDict* is similar to the built-in Julia type `Dict`

with the additional feature that the keys are stored in sorted order and can be efficiently iterated in this order. SortedDict is a subtype of AbstractDict. It is generally slower than `Dict`

because looking up a key requires an O(log *n*) tree search rather than an expected O(1) hash-table lookup time of `Dict`

. SortedDict is a parameterized type with three parameters, the key type `K`

, the value type `V`

, and the ordering type `O`

. SortedSet has only keys; it is an alternative to the built-in `Set`

container and is a subtype of AbstractSet. Internally, SortedSet is implemented as a SortedDict in which the value type is `Nothing`

. Finally, SortedMultiDict is similar to SortedDict except that each key can be associated with multiple values. The key=>value pairs in a SortedMultiDict are stored according to the sorted order for keys, and key=>value pairs with the same key are stored in order of insertion.

The containers internally use a 2-3 tree, which is a kind of balanced tree and is described in data structure textbooks. Internally, one `Vector`

is used to store key/data pairs (the leaves of the tree) while a second holds the tree structure.

The containers require two functions to compare keys: a *less-than* and *equals* function. With the default ordering argument, the comparison functions are `isless(key1,key2)`

(true when `key1 < key2`

) and `isequal(key1,key2)`

(true when `key1 == key2`

) where `key1`

and `key2`

are keys. More details are provided below.

## Tokens for Sorted Containers

The sorted containers support an object for indexing called a *token* defined as a two-entry tuple and aliased as `SortedDictToken`

, `SortedMultiDictToken`

, or `SortedSetToken`

. A token is the address of a single data item in the container and can be dereferenced in time O(1).

The first entry of a token tuple is the container as a whole, and the second refers to the particular item. The second part is called a *semitoken*. The type of the semitoken is `IntSemiToken`

.

A restriction for the sorted containers is that `IntSemiToken`

cannot used as the key-type. This is because ambiguity would result between the two subscripting calls `sc[k]`

and `sc[st]`

described below. In the rare scenario that a sorted container whose key-type is `IntSemiToken`

is required, a workaround is to wrap the key inside another immutable structure.

The notion of token is similar to the concept of iterators used by C++ standard containers. Tokens can be explicitly advanced or regressed through the data in the sorted order; they are implicitly advanced or regressed via iteration defined below.

A token may take two special values: the *before-start* value and the *past-end* value. These values act as lower and upper bounds on the actual data. The before-start token can be advanced, while the past-end token can be regressed. A dereferencing operation on either leads to an error.

In the current implementation, semitokens are internally stored as integers. Users should regard these integers as opaque since future versions of the package may change the internal indexing scheme. In certain situations it may be more costly to operate on tokens than semitokens because the first entry of a token (i.e., the container) is not a bits-type. If code profiling indicates that statements using tokens are allocating memory, then it may be advisable to rewrite the application code using semitokens rather than tokens.

## Complexity of Sorted Containers

In the list of functions below, the running time of the various operations is provided. In these running times, *n* denotes the number of items in the container, and *c* denotes the time needed to compare two keys.

## Constructors for Sorted Containers

`SortedDict`

constructors

`DataStructures.SortedDict`

— Method```
SortedDict{K,V,Ord}(o::Ord=Forward) where {K, V, Ord <: Ordering}
SortedDict{K,V,Ord}(o::Ord, kv) where {K, V, Ord <: Ordering}
```

Construct a `SortedDict`

with key type `K`

and value type `V`

with `o`

ordering from an iterable `kv`

. The iterable should generate either `Pair{K,V}`

or `Tuple{K,V}`

. If omitted, then the SortedDict is initially empty. Time: O(*cn* log *n*) where *n* is the length of the iterable.

`DataStructures.SortedDict`

— Method```
SortedDict(o::Ord=Forward) where {Ord <: Ordering}
SortedDict{K,V}(o::Ord=Forward) where {K,V,Ord<:Ordering}
```

Construct an empty `SortedDict`

with key type `K`

and value type `V`

with `o`

ordering (default to forward ordering). If `K`

and `V`

are not specified as in the first form, then they are assumed to both be `Any`

. Time: O(1)

**Note that a key type of Any or any other abstract type will lead to slow performance, as the values are stored boxed (i.e., as pointers), and insertion will require a run-time lookup of the appropriate comparison function. It is recommended to always specify a concrete key type, or to use one of the constructors in which the key type is inferred.**

`DataStructures.SortedDict`

— Method```
SortedDict(iter, o::Ord=Forward) where {Ord <: Ordering}
SortedDict(o::Ordering, iter)
SortedDict{K,V}(iter, o::Ordering=Forward) where {K,V}
SortedDict{K,V}(o::Ordering, iter) where {K,V}
```

Construct a `SortedDict`

from an arbitrary iterable object of `key=>value`

pairs or `(key,value)`

tuples with order object `o`

. The key type and value type are inferred from the given iterable in the first two forms. The first two forms copy the data three times, so it is more efficient to explicitly specify `K`

and `V`

as in the second two forms. Time: O(*cn* log *n*)

`DataStructures.SortedDict`

— Method```
SortedDict(ps::Pair...)
SortedDict(o::Ordering, ps::Pair...)
SortedDict{K,V}(ps::Pair...)
SortedDict{K,V}(o::Ordering, ps::Pair...) where {K,V}
```

Construct a `SortedDict`

from the given key-value pairs. The key type and value type are inferred from the given key-value pairs in the first two forms. The ordering is assumed to be `Forward`

ordering in the first and third form. The first two forms (where `K`

and `V`

are not specified but inferred) involves copying the data three times and so is less efficient than the second two forms. Time: O(*cn* log *n*)

`DataStructures.SortedDict`

— Method```
SortedDict{K,V}(::Val{true}, iterable) where {K, V}
SortedDict{K,V}(::Val{true}, iterable, ord::Ordering) where {K,V}
```

Construct a `SortedDict`

from an iterable whose eltype is Tuple{K,V} or Pair{K,V} and that is already in sorted ordered. The first form assumes Forward ordering. No duplicate keys allowed. Time: O(*cn*).

`SortedMultiDict`

constructors

`DataStructures.SortedMultiDict`

— Method```
SortedMultiDict{K,V,Ord}(o::Ord=Forward) where {K, V, Ord <: Ordering}
SortedMultiDict{K,V,Ord}(o::Ord, iterable) where {K, V, Ord <: Ordering}
```

Construct a sorted multidict in which type parameters are explicitly listed; ordering object is explicitly specified. Time: O(*cn* log *n*)

`DataStructures.SortedMultiDict`

— Method```
SortedMultiDict(o::Ord=Forward) where {Ord <: Ordering}
SortedMultiDict{K,V}(o::Ordering=Forward) where {K,V}
```

Construct an empty `SortedMultiDict`

with key type `K`

and value type `V`

with `o`

ordering (default to `Forward`

ordering). If `K`

and `V`

are not specified as in the first form, then they are assumed to both be `Any`

. Time: O(1).

**Note that a key type of Any or any other abstract type will lead to slow performance, as the values are stored boxed (i.e., as pointers), and insertion will require a run-time lookup of the appropriate comparison function. It is recommended to always specify a concrete key type, or to use one of the constructors in which the key type is inferred.**

`DataStructures.SortedMultiDict`

— Method```
SortedMultiDict(ps::Pair...)
SortedMultiDict(o, ps::Pair...)
SortedMultiDict{K,V}(ps::Pair...)
SortedMultiDict{K,V}(o, ps::Pair...)
```

Construct a `SortedMultiDict`

from the given key-value pairs. The key type and value type are inferred from the given key-value pairs in the first two form. The ordering is assumed to be `Forward`

ordering in the first and third forms. The first two forms involve copying the data three times to infer the types and so are less efficient than the third and fourth form where `{K,V}`

are specified explicitly. Time: O(*cn* log *n*)

`DataStructures.SortedMultiDict`

— Method```
SortedMultiDict(iter, o::Ord=Forward) where {Ord <: Ordering}
SortedMultiDict(o::Ordering, iter)
SortedMultiDict{K,V}(iter, o::Ordering=Forward) where {K, V}
SortedMultiDict{K,V}(o::Ordering, iter) where {K, V}
```

Construct a `SortedMultiDict`

from an arbitrary iterable object of `key=>value`

pairs or (key,value) tuples with order object `o`

. The key type and value type are inferred from the given iterable in the first two forms. The first two forms copy the data three times, so it is more efficient to explicitly specify `K`

and `V`

as in the second two forms. Time: O(*cn* log *n*)

`DataStructures.SortedMultiDict`

— Method```
SortedMultiDict{K,V}(::Val{true}, iterable) where {K,V}
SortedMultiDict{K,V}(::Val{true}, iterable, ord::Ord) where {K,V,Ord<:Ordering}
```

Construct a `SortedMultiDict`

from an iterable whose eltype is Tuple{K,V} or Pair{K,V} and that is already in sorted ordered. The first form assumes Forward ordering. Duplicate keys allowed. Time: O(*cn*).

`SortedSet`

constructors

`DataStructures.SortedSet`

— Method```
SortedSet{K,Ord}(o::Ord=Forward) where {K, Ord<:Ordering}
SortedSet{K,Ord}(o::Ord, iter) where {K, Ord<:Ordering}
```

Construct a SortedSet of eltype `K`

using from elements produced by iterable `iter`

(e.g., an array) and ordering object `o`

. Running time: O(*cn* log *n*) where *n* is the length of iterable.

`DataStructures.SortedSet`

— Method```
SortedSet(o::Ord=Forward) where {Ord <: Ordering}
SortedSet{K}(o::Ord=Forward) where {K, Ord<:Ordering}
```

Construct an empty `SortedSet`

with `Forward`

ordering. The first form assumes element type of `Any`

. Time: O(1).

**Note that an element type of Any or any other abstract type will lead to slow performance.**

`DataStructures.SortedSet`

— Method```
SortedSet(o::Ordering, iter)
SortedSet(iter, o::Ordering=Forward)
SortedSet{K}(o::Ordering, iter)
SortedSet{K}(iter, o::Ordering=Forward)
```

Construct a sorted set from an iterable `iter`

using order o. In the first two forms, the element type is inferred from the iterable, which requires copying the data twice. Therefore, the second two forms (specifying `K`

explicitly) are more efficient. Time: O(*cn* log *n*)

`DataStructures.SortedSet`

— Method```
SortedSet{K}(::Val{true}, iterable) where {K}
SortedSet{K}(::Val{true}, iterable, ord::Ord) where {K, Ord <: Ordering}
```

Construct a `SortedSet`

from an iterable whose entries have type `K`

and that is already in sorted ordered. No duplicates allowed. The first form assumes Forward ordering. Time: O(*cn*).

## Navigating the Containers

`Base.getindex`

— Method`Base.getindex(sd::SortedDict, k)`

Retrieve the value associated with key `k`

in SortedDict `sc`

. Yields a `KeyError`

if `k`

is not found. The following functions do not throw an error if the key is not found: `Base.get(sd::SortedDict,k,v)`

and `findkey(sd::SortedDict, k)`

. Time: O(*c* log *n*)

`Base.getindex`

— Method```
Base.getindex(m::SortedDict, st::IntSemiToken)
Base.getindex(m::SortedMultiDict, st::IntSemiToken)
```

Retrieve value portion of item from SortedDict or SortedMultiDict `m`

indexed by `st`

, a semitoken. Notation `m[st]`

appearing in an expression is equivalent to `deref_value(token::Token)`

where `token=(m,st)`

. It is a `BoundsError`

if the token is invalid. Prepending with `@inbounds`

may elide the correctness check and results in undefined behavior if the token is invalid. Time: O(1)

`Base.setindex!`

— Method```
Base.setindex!(m::SortedDict, newvalue, st::IntSemiToken)
Base.setindex!(m::SortedMultiDict, newvalue, st::IntSemiToken)
```

Set the value portion of item from SortedDict or SortedMultiDict `m`

indexed by `st`

, a semitoken to `newvalue`

. A `BoundsError`

is thrown if the token is invalid. Prepending with `@inbounds`

may elide the correctness check and results in undefined behavior if the token is invalid. Time: O(1)

`Base.setindex!`

— Method`Base.setindex!(sd::SortedDict, newvalue, k)`

Assign or reassign the value associated with the key `k`

to `newvalue`

. Note that the key is also overwritten; this is not necessarily a no-op since the equivalence in the sort-order does not imply equality. See also `push_return_semitoken!(sd::SortedDict, p::Pair)`

. Time: O(*c* log *n*)

`DataStructures.deref`

— Method```
deref(token::Token)
deref((m,st))
```

Return the data item indexed by the token. If the container is a `SortedSet`

, then this is a key in the set. If the container is a `SortedDict`

or `SortedMultiDict`

, then this is a key=>value pair. It is a `BoundsError`

if the token is invalid or is the before-start or past-end token. Prepending with `@inbounds`

may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a sorted container `m`

and a semitoken `st`

. Time: O(1)

`DataStructures.deref_key`

— Method```
deref_key(token::Token)
deref_key((m,st))
```

Return the key portion of a data item (a key=>value pair) in a `SortedDict`

or `SortedMultiDict`

indexed by the token. It is a `BoundsError`

if the token is invalid or is the before-start or past-end token. Prepending with `@inbounds`

may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a container `m`

and a semitoken `st`

. Time: O(1)

`DataStructures.deref_value`

— Method```
deref_value(token::Token)
deref_value((m,st))
```

Returns the value portion of a data item (a key=>value pair) in a `SortedDict`

or `SortedMultiDict`

indexed by the token. It is a `BoundsError`

if the token is invalid or is the before-start or past-end token. Prepending with `@inbounds`

may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a container `m`

and a semitoken `st`

. Time: O(1)

`Base.firstindex`

— Method`Base.firstindex(m::SortedContainer)`

Return the semitoken of the first entry of the container `m`

, or the past-end semitoken if the container is empty. This function was called `startof`

(now deprecated) in previous versions of the package. Time: O(log *n*)

`Base.lastindex`

— Method`Base.lastindex(m::SortedContainer)`

Return the semitoken of the last entry of the sorted container `m`

, or the before-start semitoken if the container is empty. This function was called `endof`

(now deprecated) in previous versions of the package. Time: O(log *n*)

`DataStructures.token_firstindex`

— Method`token_firstindex(m::SortedContainer)`

Return the token of the first entry of the sorted container `m`

, or the past-end token if the container is empty. Time: O(log *n*)

`DataStructures.token_lastindex`

— Method`token_lastindex(m::SortedContainer)`

Return the token of the last entry of the sorted container `m`

, or the before-start semitoken if the container is empty. Time: O(log *n*)

`Base.first`

— Method`Base.first(sc::SortedContainer)`

Return the first item (a `k=>v`

pair for SortedDict and SortedMultiDict or an element for SortedSet) in `sc`

according to the sorted order in the container. It is a `BoundsError`

to call this function on an empty container. Equivalent to `deref(token_startindex(sc))`

. Time: O(log *n*)

`Base.last`

— Method`Base.last(sc::SortedContainer)`

Return the last item (a `k=>v`

pair for SortedDict and SortedMultiDict or a key for SortedSet) in `sc`

according to the sorted order in the container. It is a `BoundsError`

to call this function on an empty container. Equivalent to `deref(token_lastindex(sc))`

. Time: O(log *n*)

`DataStructures.pastendsemitoken`

— Method`pastendsemitoken(m::SortedContainer)`

Return the semitoken of the entry that is one past the end of the sorted container `m`

. Time: O(1)

`DataStructures.beforestartsemitoken`

— Method`beforestartsemitoken(m::SortedContainer)`

Return the semitoken of the entry that is one before the beginning of the sorted container `m`

. Time: O(1)

`DataStructures.pastendtoken`

— Method`pastendtoken(m::SortedContainer)`

Return the token of the entry that is one past the end of the sorted container `m`

. Time: O(1)

`DataStructures.beforestarttoken`

— Method`beforestarttoken(m::SortedContainer)`

Return the token of the entry that is one before the beginning of the sorted container `m`

. Time: O(1)

`DataStructures.advance`

— Method```
advance(token::Token)
advance((m,st))
```

Return the semitoken of the item in a sorted container one after the given token. A `BoundsError`

is thrown if the token is the past-end token. Prepending with `@inbounds`

may elide the correctness check and will result in undefined behavior if the token is invalid or points to the past-end token. The second form creates the token in-place as a tuple of a container `m`

and a semitoken `st`

. Time: O(log *n*)

`DataStructures.regress`

— Method```
regress(token::Token)
regress((m,st))
```

Return the semitoken of the item in a sorted container one before the given token. A `BoundsError`

is thrown if the token is the before-start token. Prepending with `@inbounds`

may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start token. The second form creates the token in-place as a tuple of a container `m`

and a semitoken `st`

. Time: O(log *n*)

`Base.:+`

— Method```
Base.+(t::Token, j::Integer)
Base.-(t::Token, j::Integer)
```

Return the token that is `j`

positions ahead (if `+`

) or behind (if `-`

) of `t`

. Here, `t`

is a token for a sorted container and `j`

is an integer. If `j`

is negative, then `+`

regresses while `-`

advances. If the operation `t+j`

or `t-j`

reaches the before-start or past-end positions in the container, then the before-start/past-end tokens are returned (and there is no error). Time: O(*j*+log *n*), so this function is not optimized for long jumps.

`Base.Sort.searchsortedfirst`

— Method`Base.searchsortedfirst(m::SortedContainer, k)`

Return the semitoken of the first item in the sorted container `m`

that is greater than or equal to `k`

in the sort order. If there is no such item, then the past-end semitoken is returned. Time: O(*c* log *n*)

`Base.Sort.searchsortedlast`

— Method`Base.searchsortedlast(m::SortedContainer, k)`

Return the semitoken of the last item in the container that is less than or equal to `k`

in sort order. If there is no such item, then the before-start semitoken is returned. Time: O(*c* log *n*)

`DataStructures.searchsortedafter`

— Method`searchsortedafter(m::SortedContainer, k)`

Return the semitoken of the first item in the container that is greater than `k`

in the sort order. If there is no such item, then the past-end semitoken is returned. Time: O(*c* log *n*)

`DataStructures.searchequalrange`

— Method`DataStructures.searchequalrange(smd::SortedMultiDict, k)`

Return two semitokens that correspond to the first and last items in the SortedMultiDict that have key exactly equal to `k`

. If `k`

is not found, then it returns (pastendsemitoken(smd), beforestartsemitoken(smd)). Time: O(*c* log *n*)

`DataStructures.findkey`

— Method`findkey(m::SortedSet, k)`

Return the semitoken of the element `k`

in sorted set `m`

. If the element is not present in `m`

, then the past-end semitoken is returned. Time: O(*c* log *n*)

`DataStructures.findkey`

— Method`DataStructures.findkey(sd::SortedDict, k)`

Return the semitoken that points to the item whose key is `k`

, or past-end semitoken if `k`

is absent. See also `Base.getindex(sd::SortedDict, k)`

Time: O(*c* log *n*)

## Inserting & Deleting in Sorted Containers

`Base.push!`

— Method`Base.push!(ss::SortedSet, k)`

Insert the element `k`

into the sorted set `ss`

. If the `k`

is already present, this overwrites the old value. (This is not necessarily a no-op; see remarks about the customizing the sort order.) See also `push_return_semitoken!(ss::SortedSet, k)`

. The return value is `ss`

. Time: O(*c* log *n*)

`Base.push!`

— Method`Base.push!(sd::SortedDict, p::Pair)`

Insert key-value pair `p`

, i.e., a `k=>v`

pair, into `sd`

. If the key `k`

is already present, this overwrites the old value. The key is also overwritten (not necessarily a no-op, since sort-order equivalence may differ from equality). The return value is `sd`

. See also `push_return_semitoken!(sd::SortedDict, p::Pair)`

. Time: O(*c* log *n*)

`Base.push!`

— Method`Base.push!(smd::SortedMultiDict, p::Pair)`

Insert the pair `p`

, i.e., a `k=>v`

into `smd`

. If `k`

already appears as a key in `smd`

, then `k=>v`

is inserted in the rightmost position after existing items with key `k`

. Returns the container. See also `push_return_semitoken!(smd::SortedMultiDict, p::Pair)`

. Time: O(*c* log *n*)

`DataStructures.push_return_semitoken!`

— Method`DataStructures.push_return_semitoken!(ss::SortedSet, k)`

Insert the element `k`

into the SortedSet `sc`

. If `k`

is already present, this overwrites the old value. (This is not necessarily a no-op; see remarks about the customizing the sort order.) Unlike `push!`

, the return value is a pair whose first entry is boolean and indicates whether the insertion was new (i.e., the key was not previously present) and the second entry is the semitoken of the new entry. This function replaces the deprecated `insert!`

. Time: O(*c* log *n*)

`DataStructures.push_return_semitoken!`

— Method`DataStructures.push_return_semitoken!(sd::SortedDict, p::Pair)`

Insert pair `p`

of the form `k=>v`

into `sd`

. If the key is already present in `sd`

, this overwrites the old value. Note that the key is also overwritten, which is not necessarily a no-op because equivalence in the sort order does not necessarily imply equality. Unlike `push!`

, the return value is a 2-tuple whose first entry is boolean and indicates whether the insertion was new (i.e., the key was not previously present) and whose second entry is the semitoken of the new entry. This function replaces the deprecated `insert!(sd,k,v)`

. Time: O(*c* log *n*)

`DataStructures.push_return_semitoken!`

— Method`DataStructures.push_return_semitoken!(smd::SortedMultiDict, pr::Pair)`

Insert the key-value pair `pr`

, i.e., `k=>v`

, into `smd`

. If `k`

already appears as a key in `smd`

, then `k=>v`

is inserted in the rightmost position after existing items with key `k`

. Unlike `push!`

, the return value is a 2-tuple whose first entry is boolean always equal to `true`

and whose second entry is the semitoken of the new entry. (The reason for returning a bool whose value is always `true`

is for consistency with `push_return_semitoken!`

for SortedDict and SortedSet.) This function replaces the deprecated `insert!`

. Time: O(*c* log *n*)

`Base.delete!`

— Method`Base.delete!(token::Token)`

Delete the item indexed by the token from a sorted container. A `BoundsError`

is thrown if the token is invalid. Prepending with `@inbounds`

may elide the correctness check and will result in undefined behavior if the token is invalid. Time: O(log *n*).

`Base.delete!`

— Method`Base.delete!(ss::SortedSet, k)`

Delete element `k`

from `sc`

. After this operation is complete, a token addressing the deleted item is invalid. Returns `sc`

. if `k`

is not present, this operation is a no-op. Time: O(*c* log *n*)

`Base.delete!`

— Method`Base.delete!(sd::SortedDict, k)`

Delete the item whose key is `k`

in `sd`

. After this operation is complete, any token addressing the deleted item is invalid. Returns `sc`

. This is a no-op if `k`

is not present in `sd`

. Time: O(*c* log *n*)

`Base.pop!`

— Method```
Base.pop!(ss::SortedSet, k)
Base.pop!(ss::SortedSet, k, default)
```

Delete the item with key `k`

in `ss`

and return the item that compares equal to `k`

according to the sort order (which is not necessarily `k`

, since equality in the sort-order does not necessarily imply hash-equality). If `k`

is not found, return `default`

, or throw a `KeyError`

if `default`

is not specified. Time: O(*c* log *n*)

`Base.pop!`

— Method```
Base.pop!(sd::SortedDict, k)
Base.pop!(sd::SortedDict, k, default)
```

Delete the item with key `k`

in `sd`

and return the value that was associated with `k`

. If `k`

is not in `sd`

return `default`

, or throw a `KeyError`

if `default`

is not specified. Time: O(*c* log *n*)

`Base.popfirst!`

— Method`Base.popfirst!(ss::SortedSet)`

Delete the item with first key in SortedSet `ss`

and returns the key. This function was named `pop!`

in a previous version of the package. A `BoundsError`

results if `ss`

is empty. Time: O(log *n*)

`DataStructures.poplast!`

— Method`poplast!(ss::SortedSet)`

Delete the item with last key in SortedSet `ss`

and returns the key. A `BoundsError`

results if `ss`

is empty. This function will be renamed `Base.pop!`

in a future version of the package. Time: O(log *n*)

## Iteration and Token Manipulation

`DataStructures.compare`

— Method`compare(m::SortedContainer, s::IntSemiToken, t::IntSemiToken)`

Determine the relative position according to the sort order of the data items indexed by tokens `(m,s)`

and `(m,t)`

. Return:

`-1`

if`(m,s)`

precedes`(m,t)`

,`0`

if`s == t`

`1`

if`(m,s)`

succeeds`(m,t)`

.

The relative positions are determined from the tree topology without any key comparisons. Time: O(log *n*)

`DataStructures.status`

— Method```
status(token::Token)
status((m, st))
```

Determine the status of a token. Return values are:

- 0 = invalid token
- 1 = valid and points to live data
- 2 = before-start token
- 3 = past-end token

The second form creates the token in-place as a tuple of a sorted container `m`

and a semitoken `st`

. Time: O(1)

`Base.iterate`

— Method`Base.iterate(iter::SortedContainerIterable)`

with the following helper functions to construct a `SortedContainerIterable`

:

```
inclusive(m::SortedContainer, st1, st2)
inclusive(m::SortedContainer, (st1, st2))
inclusive_key(m::SortedContainer, key1, key2)
inclusive_key(m::SortedContainer, (key1, key2))
exclusive(m::SortedContainer, st1, st2)
exclusive(m::SortedContainer, (st1, st2))
exclusive_key(m::SortedContainer, key1, key2)
exclusive_key(m::SortedContainer, (key1, key2))
Base.keys(b)
Base.values(b)
Base.pairs(b)
Base.eachindex(b)
tokens(kv)
semitokens(kv)
onlytokens(kv)
onlysemitokens(kv)
Base.Iterators.reverse(m)
(:)(a,b)
```

Iterate over a sorted container, typically within a for-loop, comprehension, or generator. Here, `iter`

is an iterable object constructed from a sorted container. The possible iterable objects are constructed from the helper functions as follows:

A *basic* iterable object is either

- an entire sorted container
`m`

, `inclusive(m, (st1, st2))`

or equivalently`inclusive(m, st1, st2)`

,`inclusive_key(m, (k1, k2))`

or equivalently`inclusive_key(m, k1, k2)`

`a:b`

, where`a`

and`b`

are tokens addressing the same container`exclusive(m, (st1, st2))`

or equivalently`exclusive(m, st1, st2)`

`exclusive_key(m, (k1, k2))`

or equivalently`exclusive_key(m, k1, k2)`

These extract ranges of consecutive items in the containers. In the `inclusive`

and `exclusive`

constructions, constructions, `m`

is a container, `st1`

and `st2`

are semitokens. The `inclusive`

range includes both endpoints `st1`

and `st2`

. The inclusive iteration is empty if `compare(m,st1,st2)<0`

. The `exclusive`

range includes endpoint `st1`

but not `st2`

. The exclusive iteration is empty if `compare(m,st1,st2)<=0`

. In the exclusive iteration, it is acceptable if `st2`

is the past-end semitoken.

The range `exclusive_key`

means all data items with keys between `k1`

up to but excluding items with key `k2`

. For this range to be nonempty, `k1<k2`

must hold (in the sort order). The range `inclusive_key`

means all data items with keys between `k1`

and `k2`

inclusive. For this range to be nonempty, `k1<=k2`

must hold.

A *kv* iterable object has the form

`b`

, a basic iterable object`keys(b)`

where`b`

is a basic object. Extract keys only (not applicable to SortedSet)`values(b)`

where`b`

is a basic object. Extract values only (not applicable to SortedSet).`pairs(b)`

where`b`

is a basic object. Extracts key=>value pairs (not applicable to SortedSet). This is the same as just specifying`b`

and is provided only for compatibility with`Base.pairs`

.

A *tkv* object has the form

`kv`

, a kv iterable object`tokens(kv)`

where`kv`

is a kv iterable object. Return 2-tuples of the form`(t,w)`

, where`t`

is the token of the item and`w`

is a key or value if`kv`

is a keys or values iteration, or`(t,k,v)`

if`kv`

is a pairs iteration.`semitokens(kv)`

where`kv`

is a kv iterable object. Return pairs of the form`(st,w)`

, where`st`

is the token of the item and`w`

is a key or value if`kv`

is a keys or values iteration, or`(st,k,v)`

if`kv`

is a pairs iteration.`onlytokens(kv)`

where`kv`

is a kv iterable object. Return only tokens of the data items but not the items themselves. The`keys`

,`values`

, or`pairs`

modifiers described above are ignored.`onlysemitokens(kv)`

where`kv`

is a kv iterable object. Return only semitokens of the data items but not the items themselves. The`keys`

,`values`

, or`pairs`

modifiers described above are ignored.

Finally, a tkv iteration can be reversed by the `Iterators.reverse`

function. The `Iterators.reverse`

function may be nested in an arbitrary position with respect to the other operations described above. Two reverse operations cancel each other out. For example, `Iterators.reverse(keys(Iterators.reverse(m)))`

is the same iteration as `keys(m)`

.

For compatibility with `Base`

, there is also an `eachindex`

function: `eachindex(b)`

where the base object `b`

a SortedDict is the same as `keys(b)`

(to be compatible with Dict). On the other hand, `eachindex(b)`

where the base object `b`

is a SortedSet or SortedMultiDict is the same as `onlysemitokens(b)`

.

Colon notation `a:b`

is equivalent to `onlytokens(inclusive(a[1], a[2], b[2]))`

, in other words, it yields an iteration that provides all the tokens of items in the sort order ranging from token `a`

up to token `b`

. It is required that `a[1]===b[1]`

(i.e., `a`

and `b`

are tokens for the same container). Exclusive iteration using colon notation is obtained via `a : b-1`

.

**Examples:**

```
for (k,v) in sd
<body>
end
```

Here, `sd`

is a `SortedDict`

or `SortedMultiDict`

. The variables `(k,v)`

are set to consecutive key-value pairs. All items in the container are produced in order.

```
for k in inclusive(ss, st1, st2)
<body>
end
```

Here, `ss`

is a `SortedSet`

, and `st1`

, and `st2`

are semitokens indexing `ss`

. The elements of the set between `st1`

and `st2`

inclusive are returned.

```
for (t,k) in tokens(keys(exclusive_key(sd, key1, key2)))
<body>
end
```

Here, `sd`

is a `SortedDict`

or `SortedMultiDict`

, `key1`

and `key2`

are keys indexing `sd`

. In this case, `t`

will be tokens of consecutive items, while `k`

will be the corresponding keys. The returned keys lie between `key1`

and `key2`

excluding `key2`

.

```
for (t,k) in Iterators.reverse(tokens(keys(exclusive_key(sd, key1, key2))))
<body>
end
```

Same as above, except the iteration is in the reverse order.

Writing on the objects returned by `values`

is not currently supported, e.g., the following `map!`

statement is not implemented even though the analogous statement is available for `Dict`

in Base.

```
s = SortedDict(3=>4)
map!(x -> x*2, values(s))
```

The workaround is an explicit loop:

```
s = SortedDict(3=>4)
for t in onlysemitokens(s)
s[t] *= 2
end
```

Running time for all iterations: O(*c*(*s* + log *n*)), where *s* is the number of steps from start to end of the iteration.

`Base.in`

— Method`Base.in(k,m::SortedSet)`

Return `true`

iff element `k`

is in sorted set `m`

is a sorted set. Unlike the `in`

function for `Set`

, this routine will thrown an error if `k`

is not convertible to `eltype(m)`

. Time: O(*c* log *n*)

`Base.in`

— Method`Base.in(p::Pair, sd::SortedDict)`

Return true if `p`

is in `sd`

. Here, `p`

is a key=>value pair. Time: O(*c* log *n* + *d*) where *d* stands for the time to compare two values.

`Base.in`

— Method`Base.in(p::Pair, smd::SortedMultiDict)`

Return true if `p`

is in `smd`

. Here, `p`

is a key=>value pair. In the The time is is O(*c* log *n* + *dl*) where *d* is the time to compare two values and *l* stands for the number of entries that have the key of the given pair. (So therefore this call is inefficient if the same key addresses a large number of values, and an alternative should be considered.)

`Base.in`

— Method`Base.in(x, iter::SortedContainerIterable)`

Returns true if `x`

is in `iter`

, where `iter`

refers to any of the iterable objects described under `Base.iterate(iter::SortedContainerIterable)`

, and `x`

is of the appropriate type. For all of the iterables except the five listed below, the algorithm used is a linear-time search. For example, the call:

`(k=>v) in exclusive(sd, st1, st2)`

where `sd`

is a SortedDict, `st1`

and `st2`

are semitokens, `k`

is a key, and `v`

is a value, will loop over all entries in the dictionary between the two tokens and a compare for equality using `isequal`

between the indexed item and `k=>v`

.

The five exceptions are:

```
(k=>v) in sd
(k=>v) in smd
k in ss
k in keys(sd)
k in keys(smd)
```

Here, `sd`

is a SortedDict, `smd`

is a SortedMultiDict, and `ss`

is a SortedSet.

These five invocations of `in`

use the index structure of the sorted container and test equality based on the order object of the keys rather than `isequal`

. Therefore, these five are all faster than linear-time looping. To force the use of `isequal`

test on the keys rather than the order object (thus slowing the execution from logarithmic to linear time), replace the above five constructs with these:

```
(k=>v) in collect(sd)
(k=>v) in collect(smd)
k in collect(ss)
k in collect(keys(sd))
k in collect(keys(smd))
```

## Misc. Functions

```
Base.isempty(m::SortedContainer)
Base.empty!(m::SortedContainer)
Base.empty(m::SortedContainer)
Base.length(m::SortedContainer)
Base.eltype(m::SortedContainer)
Base.keytype(m::SortedContainer)
Base.valtype(m::SortedContainer)
Base.eltype(m::SortedContainerIteration)
Base.keytype(m::SortedContainerIteration)
Base.valtype(m::SortedContainerIteration)
```

These functions from `Base`

are all applicable to sorted containers with the obvious meaning. The `eltype`

, `keytype`

, and `valtype`

functions may be applied either to the object `m`

or its type. Note that `keytype`

and `valtype`

are applicable only to SortedDict and SortedMultiDict, or to pairs iterations over SortedDict or SortedMultiDict. Time: O(1)

`DataStructures.ordtype`

— Method```
ordtype(sc::SortedSet)
ordtype(sc::SortedDict)
ordtype(sc::SortedMultiDict)
```

Return the order type for a sorted container. This function may also be applied to the type itself. Time: O(1)

`DataStructures.orderobject`

— Method`orderobject(sc::SortedContainer)`

Return the order object used to construct the container. Time: O(1)

`Base.haskey`

— Method`haskey(sc::SortedContainer, k)`

Return `true`

iff key `k`

is present in `sc`

. Equivalent to `in(k,sc)`

for a SortedSet, or to `in(k,keys(sc))`

for a SortedDict or SortedMultiDict. Time: O(*c* log *n*)

`Base.get`

— Method```
Base.get(sd::SortedDict,k,default)
Base.get(default_f::Union{Function,Type}, sd::SortedDict, k)
```

Return the value associated with key `k`

where `sd`

is a SortedDict, or else returns `default`

if `k`

is not in `sd`

. The second form obtains `default`

as the return argument of the function/type-constructor `default_f`

(with no arguments) when the key is not present. Time: O(*c* log *n*)

`Base.get!`

— Method```
Base.get!(sd::SortedDict,k,default)
Base.get!(default_f::Union{Function,Type}, sd::SortedDict, k)
```

Return the value associated with key `k`

where `sd`

is a SortedDict, or else return `default`

if `k`

is not in `sd`

, and in the latter case, inserts `(k,default)`

into `sd`

. The second form computes a default value by calling the function `default_f`

(with no arguments) or the constructor of type `default_f`

when the key is not present. Time: O(*c* log *n*)

`Base.getkey`

— Method`Base.getkey(sd::SortedDict,k,defaultk)`

Return the key `k`

where `sd`

is a SortedDict, if `k`

is in `sd`

else it returns `defaultk`

. If the container uses in its ordering an `eq`

method different from isequal (e.g., case-insensitive ASCII strings illustrated below), then the return value is the actual key stored in the SortedDict that is equivalent to `k`

according to the `eq`

method, which might not be equal to `k`

. Similarly, if the user performs an implicit conversion as part of the call (e.g., the container has keys that are floats, but the `k`

argument to `getkey`

is an Int), then the returned key is the actual stored key rather than `k`

. Time: O(*c* log *n*)

`Base.isequal`

— Method```
Base.isequal(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K,Ord <: Ordering}
Base.issetequal(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K,Ord <: Ordering}
```

Check if two sorted sets are equal in the sense that they contain the same items. Note that `isequal`

in this sense does not imply correspondence between semitokens for items in `sc1`

with those for `sc2`

. Time: O(*cn*) where n is the size of the smaller container. If the two sorted sets have `K`

, different `Ord`

, or different order objects, then a fallback routine `isequal(::AbstractSet, ::AbstractSet)`

is invoked.

`Base.isequal`

— Method`Base.isequal(sd1::SortedDict{K,V,Ord}, sd2::SortedDict{K,V,Ord}) where {K, V, Ord <: Ordering}`

Check if two SortedDicts are equal in the sense that they contain the same items; the keys are compared using the `eq`

method, while the values are compared with the `isequal`

function. Note that `isequal`

in this sense does not imply correspondence between semitokens for items in `sd1`

with those for `sd2`

. Time: O(*cn*). Note that if `K`

, `V`

, `Ord`

, or the order objects of sd1 and sd2 are different, then a fallback routine `Base.isequal(::AbstractDict, ::AbstractDict)`

is invoked. Time: O(*cn*)

`Base.isequal`

— Method`Base.isequal(smd1::SortedMultiDict{K,V,Ord}, smd2::SortedMultiDict{K,V,Ord}) where {K, V, Ord <: Ordering}`

Check if two SortedMultiDicts are equal in the sense that they contain the same items in the same order (that is, the same insertion order). They must have the same order object, else they compare unequal. The keys are compared using the `eq`

method, while the values are compared with the `isequal`

function. Note that `isequal`

in this sense does not imply any correspondence between semitokens for items in `smd1`

with those for `smd2`

. Time: O(*cn*)

`DataStructures.packcopy`

— Method```
copy(sc::SortedSet)
copy(sc::SortedDict)
copy(sc::SortedMultiDict)
packcopy(sc::SortedSet)
packcopy(sc::SortedDict)
packcopy(sc::SortedMultiDict)
```

Return a copy of `sc`

, where `sc`

is a sorted container, in which the data is packed. When deletions take place, the previously allocated memory is not returned. This function can be used to reclaim memory after many deletions. Time: O(*cn*)

Note that the semitokens valid for the original container are no longer valid for the copy because the indexing structure is rebuilt by these copies. If an exact copy is needed in which semitokens remain valid, use `Base.deepcopy`

.

`DataStructures.packdeepcopy`

— Method```
packdeepcopy(sc::SortedSet)
packdeepcopy(sc::SortedDict)
packdeepcopy(sc::SorteMultiDict)
```

Return a packed copy of `sc`

, where `sc`

is a sorted container in which the keys and values are deep-copied. This function can be used to reclaim memory after many deletions. Time: O(*cn*)

`Base.merge`

— Method`Base.merge(sd::SortedDict{K,V,Ord}, d1::AbstractDict{K,V}...) where {K,V,Ord <: Ordering}`

Merge one or more dicts into a single SortedDict and return the new SortedDict. Arguments `d1`

etc. must have the same key-value type as `sd`

. In the case of keys duplicated among the arguments, the rightmost argument that owns the key gets its value stored. Time: O(*cN* log *N*), where *N* is the total size of all the arguments. If all the arguments are SortedDicts with the same key, value, and order object, then the time is O(*cN*).

`Base.merge!`

— Method`Base.merge!(sd::SortedDict{K,V,Ord}, d1::AbstractDict{K,V}...) where {K,V,Ord<:Ordering}`

Merge one or more dicts `d1`

, etc. into `sd`

. These must all must have the same key-value types. In the case of keys duplicated among the arguments, the rightmost argument that owns the key gets its value stored. Time: O(*cN* log *N*), where *N* is the total size of all the arguments.

`Base.merge`

— Method`Base.merge(smd::SortedMultiDict, iter...)`

Merge `smd`

and one or more iterables and return the resulting new SortedMultiDict. The iterables must have the same key-value type as `smd`

. Items with equal keys are stored with left-to-right ordering. Time: O(*cN* log *N*), where *N* is the total size of all the arguments. If all the arguments are SortedMultiDicts with the same key, value, and order object, then the time is O(*cN*).

`Base.merge!`

— Method`Base.merge!(smd::SortedMultiDict, iter...)`

Merge one or more iterables `iter`

, etc. into `smd`

. These must all must have the same key-value types. Items with equal keys are stored with left-to-right ordering. Time: O(*cN* log *N*), where *N* is the total size of all the arguments.

## Set operations

The SortedSet container supports the following set operations. Note that in the case of `intersect`

, `symdiff`

and `setdiff`

, the two SortedSets should have the same key and ordering object. If they have different key or ordering types, no error message is produced; instead, the built-in default versions of these functions (that can be applied to `Any`

iterables and that return arrays) are invoked.

`Base.union!`

— Method`Base.union!(ss::SortedSet, iterable...)`

Insert each item among the second and following arguments (which must be iterable) into the SortedSet `ss`

. The items must be convertible to the key-type of `ss`

. Time: O(*cN* log *N*) where *N* is the total number of items in the iterable arguments.

`Base.union`

— Method`Base.union(ss::SortedSet, iterable...)`

Compute and return the union of a sorted set and one or more iterables. They must have the same keytype. If they are all sorted sets with the same order object, then the required time is O(*cn*), where *n* is the total size. If not, then the fallback routine requires time O(*cn* log *n*).

`Base.intersect`

— Method`Base.intersect(ss::SortedSet, others...)`

Intersect SortedSets with other SortedSets or other iterables and return the intersection as a new SortedSet. Time: O(*cn*), where *n* is the total number of items in all the arguments if all the arguments are SortedSets of the same type and same order object. Otherwise, the time is O(*cn* log *n*)

`Base.symdiff`

— Method`Base.symdiff(ss1::SortedSet, iterable)`

Compute and return the symmetric difference of `ss1`

and `iterable`

, i.e., a sorted set containing entries that are in one of `ss1`

or `iterable`

but not both. Time: O(*cn*), where *n* is the total size of the two containers if both are sorted sets with the same key and order objects. Otherwise, the time is O(*cn* log *n*)

`Base.setdiff`

— Method```
Base.setdiff(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K, Ord<:Ordering}
Base.setdiff(ss1::SortedSet, others...)
```

Return the set difference, i.e., a sorted set containing entries in `ss1`

but not in `ss2`

or successive arguments. Time for the first form: O(*cn*) where *n* is the total size of both sets provided that they are both sorted sets of the same type and order object. The second form computes the set difference between `ss1`

and all the others, which are all iterables. The second form requires O(*cn* log *n*) time.

`Base.setdiff!`

— Method`Base.setdiff!(ss::SortedSet, iterable..)`

Delete items in `ss`

that appear in any of the iterables. The arguments after the first must be iterables each of whose entries must convertible to the key type of m1. Time: O(*cm* log *n*), where *n* is the size of `ss`

and *m* is the total number of items in iterable.

`Base.issubset`

— Method`Base.issubset(iterable, ss::SortedSet)`

Check whether each item of the first argument is an element of `ss`

. The entries must be convertible to the key-type of `ss`

. Time: O(*cm* log *n*), where *n* is the size of `ss`

and *m* is the number of items in `iterable`

. If both are sorted sets of the same keytype and order object and if *m* > *n* / log *n*, then an algorithm whose running time is O(*c*(*m*+*n*)) is used.

## Ordering of keys

As mentioned earlier, the default ordering of keys uses `isless`

and `isequal`

functions. If the default ordering is used, it is a requirement of the container that `isequal(a,b)`

is true if and only if `!isless(a,b)`

and `!isless(b,a)`

are both true. This relationship between `isequal`

and `isless`

holds for common built-in types, but it may not hold for all types, especially user-defined types. If it does not hold for a certain type, then a custom ordering argument must be defined as discussed in the next few paragraphs.

The name for the default ordering (i.e., using `isless`

and `isequal`

) is `Forward`

. Note: this is the name of the ordering object; its type is `ForwardOrdering.`

Another possible ordering object is `Reverse`

, which reverses the usual sorted order. This name must be imported `import Base.Reverse`

if it is used.

As an example of a custom ordering, suppose the keys are of type `String`

, and the user wishes to order the keys ignoring case: *APPLE*, *berry* and *Cherry* would appear in that order, and *APPLE* and *aPPlE* would be indistinguishable in this ordering.

The simplest approach is to define an ordering object of the form `Lt(my_isless)`

, where `Lt`

is a built-in type (see `ordering.jl`

) and `my_isless`

is the user's comparison function. In the above example, the ordering object would be:

`Lt((x,y) -> isless(lowercase(x),lowercase(y)))`

The ordering object is indicated in the above list of constructors in the `o`

position (see above for constructor syntax).

This approach may suffer from a performance hit because higher performance may be possible if an equality method is also available as well as a less-than method. A more complicated but higher-performance method to implement a custom ordering is as follows. First, the user creates a singleton type that is a subtype of `Ordering`

as follows:

```
struct CaseInsensitive <: Ordering
end
```

Next, the user defines a method named `lt`

for less-than in this ordering:

`lt(::CaseInsensitive, a, b) = isless(lowercase(a), lowercase(b))`

The first argument to `lt`

is an object of the `CaseInsensitive`

type (there is only one such object since it is a singleton type). The container also needs an equal-to function; the default is:

`eq(o::Ordering, a, b) = !lt(o, a, b) && !lt(o, b, a)`

The user can also customize this function with a more efficient implementation. In the above example, an appropriate customization would be:

`eq(::CaseInsensitive, a, b) = isequal(lowercase(a), lowercase(b))`

Note: the user-defined `eq`

and `lt`

functions must be compatible in the sense that `!lt(o, a, b) && !lt(o, b, a)`

if and only if `eq(o, a, b)`

.

Finally, the user specifies the unique element of `CaseInsensitive`

, namely the object `CaseInsensitive()`

, as the ordering object to the `SortedDict`

, `SortedMultiDict`

or `SortedSet`

constructor.

For the above code to work, the module must make the following declarations, typically near the beginning:

```
import Base.Ordering
import Base.lt
import DataStructures.eq
```

## Cautionary note on mutable keys

As with ordinary Dicts, keys for the sorted containers can be either mutable or immutable. In the case of mutable keys, it is important that the keys not be mutated once they are in the container else the indexing structure will be corrupted. (The same restriction applies to Dict.) For example, the following sequence of statements leaves `sd`

in a corrupted state:

```
sd = SortedDict{Vector{Int},Int}()
k = [1,2,3]
sd[k] = 19
sd[[6,4]] = 12
k[1] = 7
```

## Performance of Sorted Containers

The sorted containers are currently not optimized for cache performance.