Sorted Containers
Three sorted containers are provided: SortedDict, SortedMultiDict and SortedSet. SortedDict is similar to the built-in Julia type Dict
with the additional feature that the keys are stored in sorted order and can be efficiently iterated in this order. SortedDict is a subtype of AbstractDict. It is generally slower than Dict
because looking up a key requires an O(log n) tree search rather than an expected O(1) hash-table lookup time of Dict
. SortedDict is a parameterized type with three parameters, the key type K
, the value type V
, and the ordering type O
. SortedSet has only keys; it is an alternative to the built-in Set
container and is a subtype of AbstractSet. Internally, SortedSet is implemented as a SortedDict in which the value type is Nothing
. Finally, SortedMultiDict is similar to SortedDict except that each key can be associated with multiple values. The key=>value pairs in a SortedMultiDict are stored according to the sorted order for keys, and key=>value pairs with the same key are stored in order of insertion.
The containers internally use a 2-3 tree, which is a kind of balanced tree and is described in data structure textbooks. Internally, one Vector
is used to store key/data pairs (the leaves of the tree) while a second holds the tree structure.
The containers require two functions to compare keys: a less-than and equals function. With the default ordering argument, the comparison functions are isless(key1,key2)
(true when key1 < key2
) and isequal(key1,key2)
(true when key1 == key2
) where key1
and key2
are keys. More details are provided below.
Tokens for Sorted Containers
The sorted containers support an object for indexing called a token defined as a two-entry tuple and aliased as SortedDictToken
, SortedMultiDictToken
, or SortedSetToken
. A token is the address of a single data item in the container and can be dereferenced in time O(1).
The first entry of a token tuple is the container as a whole, and the second refers to the particular item. The second part is called a semitoken. The type of the semitoken is IntSemiToken
.
A restriction for the sorted containers is that IntSemiToken
cannot used as the key-type. This is because ambiguity would result between the two subscripting calls sc[k]
and sc[st]
described below. In the rare scenario that a sorted container whose key-type is IntSemiToken
is required, a workaround is to wrap the key inside another immutable structure.
The notion of token is similar to the concept of iterators used by C++ standard containers. Tokens can be explicitly advanced or regressed through the data in the sorted order; they are implicitly advanced or regressed via iteration defined below.
A token may take two special values: the before-start value and the past-end value. These values act as lower and upper bounds on the actual data. The before-start token can be advanced, while the past-end token can be regressed. A dereferencing operation on either leads to an error.
In the current implementation, semitokens are internally stored as integers. Users should regard these integers as opaque since future versions of the package may change the internal indexing scheme. In certain situations it may be more costly to operate on tokens than semitokens because the first entry of a token (i.e., the container) is not a bits-type. If code profiling indicates that statements using tokens are allocating memory, then it may be advisable to rewrite the application code using semitokens rather than tokens.
Complexity of Sorted Containers
In the list of functions below, the running time of the various operations is provided. In these running times, n denotes the number of items in the container, and c denotes the time needed to compare two keys.
Constructors for Sorted Containers
SortedDict
constructors
DataStructures.SortedDict
— MethodSortedDict{K,V,Ord}(o::Ord=Forward) where {K, V, Ord <: Ordering}
SortedDict{K,V,Ord}(o::Ord, kv) where {K, V, Ord <: Ordering}
Construct a SortedDict
with key type K
and value type V
with o
ordering from an iterable kv
. The iterable should generate either Pair{K,V}
or Tuple{K,V}
. If omitted, then the SortedDict is initially empty. Time: O(cn log n) where n is the length of the iterable.
DataStructures.SortedDict
— MethodSortedDict(o::Ord=Forward) where {Ord <: Ordering}
SortedDict{K,V}(o::Ord=Forward) where {K,V,Ord<:Ordering}
Construct an empty SortedDict
with key type K
and value type V
with o
ordering (default to forward ordering). If K
and V
are not specified as in the first form, then they are assumed to both be Any
. Time: O(1)
Note that a key type of Any
or any other abstract type will lead to slow performance, as the values are stored boxed (i.e., as pointers), and insertion will require a run-time lookup of the appropriate comparison function. It is recommended to always specify a concrete key type, or to use one of the constructors in which the key type is inferred.
DataStructures.SortedDict
— MethodSortedDict(iter, o::Ord=Forward) where {Ord <: Ordering}
SortedDict(o::Ordering, iter)
SortedDict{K,V}(iter, o::Ordering=Forward) where {K,V}
SortedDict{K,V}(o::Ordering, iter) where {K,V}
Construct a SortedDict
from an arbitrary iterable object of key=>value
pairs or (key,value)
tuples with order object o
. The key type and value type are inferred from the given iterable in the first two forms. The first two forms copy the data three times, so it is more efficient to explicitly specify K
and V
as in the second two forms. Time: O(cn log n)
DataStructures.SortedDict
— MethodSortedDict(ps::Pair...)
SortedDict(o::Ordering, ps::Pair...)
SortedDict{K,V}(ps::Pair...)
SortedDict{K,V}(o::Ordering, ps::Pair...) where {K,V}
Construct a SortedDict
from the given key-value pairs. The key type and value type are inferred from the given key-value pairs in the first two forms. The ordering is assumed to be Forward
ordering in the first and third form. The first two forms (where K
and V
are not specified but inferred) involves copying the data three times and so is less efficient than the second two forms. Time: O(cn log n)
DataStructures.SortedDict
— MethodSortedDict{K,V}(::Val{true}, iterable) where {K, V}
SortedDict{K,V}(::Val{true}, iterable, ord::Ordering) where {K,V}
Construct a SortedDict
from an iterable whose eltype is Tuple{K,V} or Pair{K,V} and that is already in sorted ordered. The first form assumes Forward ordering. No duplicate keys allowed. Time: O(cn).
SortedMultiDict
constructors
DataStructures.SortedMultiDict
— MethodSortedMultiDict{K,V,Ord}(o::Ord=Forward) where {K, V, Ord <: Ordering}
SortedMultiDict{K,V,Ord}(o::Ord, iterable) where {K, V, Ord <: Ordering}
Construct a sorted multidict in which type parameters are explicitly listed; ordering object is explicitly specified. Time: O(cn log n)
DataStructures.SortedMultiDict
— MethodSortedMultiDict(o::Ord=Forward) where {Ord <: Ordering}
SortedMultiDict{K,V}(o::Ordering=Forward) where {K,V}
Construct an empty SortedMultiDict
with key type K
and value type V
with o
ordering (default to Forward
ordering). If K
and V
are not specified as in the first form, then they are assumed to both be Any
. Time: O(1).
Note that a key type of Any
or any other abstract type will lead to slow performance, as the values are stored boxed (i.e., as pointers), and insertion will require a run-time lookup of the appropriate comparison function. It is recommended to always specify a concrete key type, or to use one of the constructors in which the key type is inferred.
DataStructures.SortedMultiDict
— MethodSortedMultiDict(ps::Pair...)
SortedMultiDict(o, ps::Pair...)
SortedMultiDict{K,V}(ps::Pair...)
SortedMultiDict{K,V}(o, ps::Pair...)
Construct a SortedMultiDict
from the given key-value pairs. The key type and value type are inferred from the given key-value pairs in the first two form. The ordering is assumed to be Forward
ordering in the first and third forms. The first two forms involve copying the data three times to infer the types and so are less efficient than the third and fourth form where {K,V}
are specified explicitly. Time: O(cn log n)
DataStructures.SortedMultiDict
— MethodSortedMultiDict(iter, o::Ord=Forward) where {Ord <: Ordering}
SortedMultiDict(o::Ordering, iter)
SortedMultiDict{K,V}(iter, o::Ordering=Forward) where {K, V}
SortedMultiDict{K,V}(o::Ordering, iter) where {K, V}
Construct a SortedMultiDict
from an arbitrary iterable object of key=>value
pairs or (key,value) tuples with order object o
. The key type and value type are inferred from the given iterable in the first two forms. The first two forms copy the data three times, so it is more efficient to explicitly specify K
and V
as in the second two forms. Time: O(cn log n)
DataStructures.SortedMultiDict
— MethodSortedMultiDict{K,V}(::Val{true}, iterable) where {K,V}
SortedMultiDict{K,V}(::Val{true}, iterable, ord::Ord) where {K,V,Ord<:Ordering}
Construct a SortedMultiDict
from an iterable whose eltype is Tuple{K,V} or Pair{K,V} and that is already in sorted ordered. The first form assumes Forward ordering. Duplicate keys allowed. Time: O(cn).
SortedSet
constructors
DataStructures.SortedSet
— MethodSortedSet{K,Ord}(o::Ord=Forward) where {K, Ord<:Ordering}
SortedSet{K,Ord}(o::Ord, iter) where {K, Ord<:Ordering}
Construct a SortedSet of eltype K
using from elements produced by iterable iter
(e.g., an array) and ordering object o
. Running time: O(cn log n) where n is the length of iterable.
DataStructures.SortedSet
— MethodSortedSet(o::Ord=Forward) where {Ord <: Ordering}
SortedSet{K}(o::Ord=Forward) where {K, Ord<:Ordering}
Construct an empty SortedSet
with Forward
ordering. The first form assumes element type of Any
. Time: O(1).
Note that an element type of Any
or any other abstract type will lead to slow performance.
DataStructures.SortedSet
— MethodSortedSet(o::Ordering, iter)
SortedSet(iter, o::Ordering=Forward)
SortedSet{K}(o::Ordering, iter)
SortedSet{K}(iter, o::Ordering=Forward)
Construct a sorted set from an iterable iter
using order o. In the first two forms, the element type is inferred from the iterable, which requires copying the data twice. Therefore, the second two forms (specifying K
explicitly) are more efficient. Time: O(cn log n)
DataStructures.SortedSet
— MethodSortedSet{K}(::Val{true}, iterable) where {K}
SortedSet{K}(::Val{true}, iterable, ord::Ord) where {K, Ord <: Ordering}
Construct a SortedSet
from an iterable whose entries have type K
and that is already in sorted ordered. No duplicates allowed. The first form assumes Forward ordering. Time: O(cn).
Navigating the Containers
Base.getindex
— MethodBase.getindex(sd::SortedDict, k)
Retrieve the value associated with key k
in SortedDict sc
. Yields a KeyError
if k
is not found. The following functions do not throw an error if the key is not found: Base.get(sd::SortedDict,k,v)
and findkey(sd::SortedDict, k)
. Time: O(c log n)
Base.getindex
— MethodBase.getindex(m::SortedDict, st::IntSemiToken)
Base.getindex(m::SortedMultiDict, st::IntSemiToken)
Retrieve value portion of item from SortedDict or SortedMultiDict m
indexed by st
, a semitoken. Notation m[st]
appearing in an expression is equivalent to deref_value(token::Token)
where token=(m,st)
. It is a BoundsError
if the token is invalid. Prepending with @inbounds
may elide the correctness check and results in undefined behavior if the token is invalid. Time: O(1)
Base.setindex!
— MethodBase.setindex!(m::SortedDict, newvalue, st::IntSemiToken)
Base.setindex!(m::SortedMultiDict, newvalue, st::IntSemiToken)
Set the value portion of item from SortedDict or SortedMultiDict m
indexed by st
, a semitoken to newvalue
. A BoundsError
is thrown if the token is invalid. Prepending with @inbounds
may elide the correctness check and results in undefined behavior if the token is invalid. Time: O(1)
Base.setindex!
— MethodBase.setindex!(sd::SortedDict, newvalue, k)
Assign or reassign the value associated with the key k
to newvalue
. Note that the key is also overwritten; this is not necessarily a no-op since the equivalence in the sort-order does not imply equality. See also push_return_semitoken!(sd::SortedDict, p::Pair)
. Time: O(c log n)
DataStructures.deref
— Methodderef(token::Token)
deref((m,st))
Return the data item indexed by the token. If the container is a SortedSet
, then this is a key in the set. If the container is a SortedDict
or SortedMultiDict
, then this is a key=>value pair. It is a BoundsError
if the token is invalid or is the before-start or past-end token. Prepending with @inbounds
may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a sorted container m
and a semitoken st
. Time: O(1)
DataStructures.deref_key
— Methodderef_key(token::Token)
deref_key((m,st))
Return the key portion of a data item (a key=>value pair) in a SortedDict
or SortedMultiDict
indexed by the token. It is a BoundsError
if the token is invalid or is the before-start or past-end token. Prepending with @inbounds
may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a container m
and a semitoken st
. Time: O(1)
DataStructures.deref_value
— Methodderef_value(token::Token)
deref_value((m,st))
Returns the value portion of a data item (a key=>value pair) in a SortedDict
or SortedMultiDict
indexed by the token. It is a BoundsError
if the token is invalid or is the before-start or past-end token. Prepending with @inbounds
may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start or past-end token. The second form creates the token in-place as a tuple of a container m
and a semitoken st
. Time: O(1)
Base.firstindex
— MethodBase.firstindex(m::SortedContainer)
Return the semitoken of the first entry of the container m
, or the past-end semitoken if the container is empty. This function was called startof
(now deprecated) in previous versions of the package. Time: O(log n)
Base.lastindex
— MethodBase.lastindex(m::SortedContainer)
Return the semitoken of the last entry of the sorted container m
, or the before-start semitoken if the container is empty. This function was called endof
(now deprecated) in previous versions of the package. Time: O(log n)
DataStructures.token_firstindex
— Methodtoken_firstindex(m::SortedContainer)
Return the token of the first entry of the sorted container m
, or the past-end token if the container is empty. Time: O(log n)
DataStructures.token_lastindex
— Methodtoken_lastindex(m::SortedContainer)
Return the token of the last entry of the sorted container m
, or the before-start semitoken if the container is empty. Time: O(log n)
Base.first
— MethodBase.first(sc::SortedContainer)
Return the first item (a k=>v
pair for SortedDict and SortedMultiDict or an element for SortedSet) in sc
according to the sorted order in the container. It is a BoundsError
to call this function on an empty container. Equivalent to deref(token_startindex(sc))
. Time: O(log n)
Base.last
— MethodBase.last(sc::SortedContainer)
Return the last item (a k=>v
pair for SortedDict and SortedMultiDict or a key for SortedSet) in sc
according to the sorted order in the container. It is a BoundsError
to call this function on an empty container. Equivalent to deref(token_lastindex(sc))
. Time: O(log n)
DataStructures.pastendsemitoken
— Methodpastendsemitoken(m::SortedContainer)
Return the semitoken of the entry that is one past the end of the sorted container m
. Time: O(1)
DataStructures.beforestartsemitoken
— Methodbeforestartsemitoken(m::SortedContainer)
Return the semitoken of the entry that is one before the beginning of the sorted container m
. Time: O(1)
DataStructures.pastendtoken
— Methodpastendtoken(m::SortedContainer)
Return the token of the entry that is one past the end of the sorted container m
. Time: O(1)
DataStructures.beforestarttoken
— Methodbeforestarttoken(m::SortedContainer)
Return the token of the entry that is one before the beginning of the sorted container m
. Time: O(1)
DataStructures.advance
— Methodadvance(token::Token)
advance((m,st))
Return the semitoken of the item in a sorted container one after the given token. A BoundsError
is thrown if the token is the past-end token. Prepending with @inbounds
may elide the correctness check and will result in undefined behavior if the token is invalid or points to the past-end token. The second form creates the token in-place as a tuple of a container m
and a semitoken st
. Time: O(log n)
DataStructures.regress
— Methodregress(token::Token)
regress((m,st))
Return the semitoken of the item in a sorted container one before the given token. A BoundsError
is thrown if the token is the before-start token. Prepending with @inbounds
may elide the correctness check and will result in undefined behavior if the token is invalid or points to the before-start token. The second form creates the token in-place as a tuple of a container m
and a semitoken st
. Time: O(log n)
Base.:+
— MethodBase.+(t::Token, j::Integer)
Base.-(t::Token, j::Integer)
Return the token that is j
positions ahead (if +
) or behind (if -
) of t
. Here, t
is a token for a sorted container and j
is an integer. If j
is negative, then +
regresses while -
advances. If the operation t+j
or t-j
reaches the before-start or past-end positions in the container, then the before-start/past-end tokens are returned (and there is no error). Time: O(j+log n), so this function is not optimized for long jumps.
Base.Sort.searchsortedfirst
— MethodBase.searchsortedfirst(m::SortedContainer, k)
Return the semitoken of the first item in the sorted container m
that is greater than or equal to k
in the sort order. If there is no such item, then the past-end semitoken is returned. Time: O(c log n)
Base.Sort.searchsortedlast
— MethodBase.searchsortedlast(m::SortedContainer, k)
Return the semitoken of the last item in the container that is less than or equal to k
in sort order. If there is no such item, then the before-start semitoken is returned. Time: O(c log n)
DataStructures.searchsortedafter
— Methodsearchsortedafter(m::SortedContainer, k)
Return the semitoken of the first item in the container that is greater than k
in the sort order. If there is no such item, then the past-end semitoken is returned. Time: O(c log n)
DataStructures.searchequalrange
— MethodDataStructures.searchequalrange(smd::SortedMultiDict, k)
Return two semitokens that correspond to the first and last items in the SortedMultiDict that have key exactly equal to k
. If k
is not found, then it returns (pastendsemitoken(smd), beforestartsemitoken(smd)). Time: O(c log n)
DataStructures.findkey
— Methodfindkey(m::SortedSet, k)
Return the semitoken of the element k
in sorted set m
. If the element is not present in m
, then the past-end semitoken is returned. Time: O(c log n)
DataStructures.findkey
— MethodDataStructures.findkey(sd::SortedDict, k)
Return the semitoken that points to the item whose key is k
, or past-end semitoken if k
is absent. See also Base.getindex(sd::SortedDict, k)
Time: O(c log n)
Inserting & Deleting in Sorted Containers
Base.push!
— MethodBase.push!(ss::SortedSet, k)
Insert the element k
into the sorted set ss
. If the k
is already present, this overwrites the old value. (This is not necessarily a no-op; see remarks about the customizing the sort order.) See also push_return_semitoken!(ss::SortedSet, k)
. The return value is ss
. Time: O(c log n)
Base.push!
— MethodBase.push!(sd::SortedDict, p::Pair)
Insert key-value pair p
, i.e., a k=>v
pair, into sd
. If the key k
is already present, this overwrites the old value. The key is also overwritten (not necessarily a no-op, since sort-order equivalence may differ from equality). The return value is sd
. See also push_return_semitoken!(sd::SortedDict, p::Pair)
. Time: O(c log n)
Base.push!
— MethodBase.push!(smd::SortedMultiDict, p::Pair)
Insert the pair p
, i.e., a k=>v
into smd
. If k
already appears as a key in smd
, then k=>v
is inserted in the rightmost position after existing items with key k
. Returns the container. See also push_return_semitoken!(smd::SortedMultiDict, p::Pair)
. Time: O(c log n)
DataStructures.push_return_semitoken!
— MethodDataStructures.push_return_semitoken!(ss::SortedSet, k)
Insert the element k
into the SortedSet sc
. If k
is already present, this overwrites the old value. (This is not necessarily a no-op; see remarks about the customizing the sort order.) Unlike push!
, the return value is a pair whose first entry is boolean and indicates whether the insertion was new (i.e., the key was not previously present) and the second entry is the semitoken of the new entry. This function replaces the deprecated insert!
. Time: O(c log n)
DataStructures.push_return_semitoken!
— MethodDataStructures.push_return_semitoken!(sd::SortedDict, p::Pair)
Insert pair p
of the form k=>v
into sd
. If the key is already present in sd
, this overwrites the old value. Note that the key is also overwritten, which is not necessarily a no-op because equivalence in the sort order does not necessarily imply equality. Unlike push!
, the return value is a 2-tuple whose first entry is boolean and indicates whether the insertion was new (i.e., the key was not previously present) and whose second entry is the semitoken of the new entry. This function replaces the deprecated insert!(sd,k,v)
. Time: O(c log n)
DataStructures.push_return_semitoken!
— MethodDataStructures.push_return_semitoken!(smd::SortedMultiDict, pr::Pair)
Insert the key-value pair pr
, i.e., k=>v
, into smd
. If k
already appears as a key in smd
, then k=>v
is inserted in the rightmost position after existing items with key k
. Unlike push!
, the return value is a 2-tuple whose first entry is boolean always equal to true
and whose second entry is the semitoken of the new entry. (The reason for returning a bool whose value is always true
is for consistency with push_return_semitoken!
for SortedDict and SortedSet.) This function replaces the deprecated insert!
. Time: O(c log n)
Base.delete!
— MethodBase.delete!(token::Token)
Delete the item indexed by the token from a sorted container. A BoundsError
is thrown if the token is invalid. Prepending with @inbounds
may elide the correctness check and will result in undefined behavior if the token is invalid. Time: O(log n).
Base.delete!
— MethodBase.delete!(ss::SortedSet, k)
Delete element k
from sc
. After this operation is complete, a token addressing the deleted item is invalid. Returns sc
. if k
is not present, this operation is a no-op. Time: O(c log n)
Base.delete!
— MethodBase.delete!(sd::SortedDict, k)
Delete the item whose key is k
in sd
. After this operation is complete, any token addressing the deleted item is invalid. Returns sc
. This is a no-op if k
is not present in sd
. Time: O(c log n)
Base.pop!
— MethodBase.pop!(ss::SortedSet, k)
Base.pop!(ss::SortedSet, k, default)
Delete the item with key k
in ss
and return the item that compares equal to k
according to the sort order (which is not necessarily k
, since equality in the sort-order does not necessarily imply hash-equality). If k
is not found, return default
, or throw a KeyError
if default
is not specified. Time: O(c log n)
Base.pop!
— MethodBase.pop!(sd::SortedDict, k)
Base.pop!(sd::SortedDict, k, default)
Delete the item with key k
in sd
and return the value that was associated with k
. If k
is not in sd
return default
, or throw a KeyError
if default
is not specified. Time: O(c log n)
Base.popfirst!
— MethodBase.popfirst!(ss::SortedSet)
Delete the item with first key in SortedSet ss
and returns the key. This function was named pop!
in a previous version of the package. A BoundsError
results if ss
is empty. Time: O(log n)
DataStructures.poplast!
— Methodpoplast!(ss::SortedSet)
Delete the item with last key in SortedSet ss
and returns the key. A BoundsError
results if ss
is empty. This function will be renamed Base.pop!
in a future version of the package. Time: O(log n)
Iteration and Token Manipulation
DataStructures.compare
— Methodcompare(m::SortedContainer, s::IntSemiToken, t::IntSemiToken)
Determine the relative position according to the sort order of the data items indexed by tokens (m,s)
and (m,t)
. Return:
-1
if(m,s)
precedes(m,t)
,0
ifs == t
1
if(m,s)
succeeds(m,t)
.
The relative positions are determined from the tree topology without any key comparisons. Time: O(log n)
DataStructures.status
— Methodstatus(token::Token)
status((m, st))
Determine the status of a token. Return values are:
- 0 = invalid token
- 1 = valid and points to live data
- 2 = before-start token
- 3 = past-end token
The second form creates the token in-place as a tuple of a sorted container m
and a semitoken st
. Time: O(1)
Base.iterate
— MethodBase.iterate(iter::SortedContainerIterable)
with the following helper functions to construct a SortedContainerIterable
:
inclusive(m::SortedContainer, st1, st2)
inclusive(m::SortedContainer, (st1, st2))
inclusive_key(m::SortedContainer, key1, key2)
inclusive_key(m::SortedContainer, (key1, key2))
exclusive(m::SortedContainer, st1, st2)
exclusive(m::SortedContainer, (st1, st2))
exclusive_key(m::SortedContainer, key1, key2)
exclusive_key(m::SortedContainer, (key1, key2))
Base.keys(b)
Base.values(b)
Base.pairs(b)
Base.eachindex(b)
tokens(kv)
semitokens(kv)
onlytokens(kv)
onlysemitokens(kv)
Base.Iterators.reverse(m)
(:)(a,b)
Iterate over a sorted container, typically within a for-loop, comprehension, or generator. Here, iter
is an iterable object constructed from a sorted container. The possible iterable objects are constructed from the helper functions as follows:
A basic iterable object is either
- an entire sorted container
m
, inclusive(m, (st1, st2))
or equivalentlyinclusive(m, st1, st2)
,inclusive_key(m, (k1, k2))
or equivalentlyinclusive_key(m, k1, k2)
a:b
, wherea
andb
are tokens addressing the same containerexclusive(m, (st1, st2))
or equivalentlyexclusive(m, st1, st2)
exclusive_key(m, (k1, k2))
or equivalentlyexclusive_key(m, k1, k2)
These extract ranges of consecutive items in the containers. In the inclusive
and exclusive
constructions, constructions, m
is a container, st1
and st2
are semitokens. The inclusive
range includes both endpoints st1
and st2
. The inclusive iteration is empty if compare(m,st1,st2)<0
. The exclusive
range includes endpoint st1
but not st2
. The exclusive iteration is empty if compare(m,st1,st2)<=0
. In the exclusive iteration, it is acceptable if st2
is the past-end semitoken.
The range exclusive_key
means all data items with keys between k1
up to but excluding items with key k2
. For this range to be nonempty, k1<k2
must hold (in the sort order). The range inclusive_key
means all data items with keys between k1
and k2
inclusive. For this range to be nonempty, k1<=k2
must hold.
A kv iterable object has the form
b
, a basic iterable objectkeys(b)
whereb
is a basic object. Extract keys only (not applicable to SortedSet)values(b)
whereb
is a basic object. Extract values only (not applicable to SortedSet).pairs(b)
whereb
is a basic object. Extracts key=>value pairs (not applicable to SortedSet). This is the same as just specifyingb
and is provided only for compatibility withBase.pairs
.
A tkv object has the form
kv
, a kv iterable objecttokens(kv)
wherekv
is a kv iterable object. Return 2-tuples of the form(t,w)
, wheret
is the token of the item andw
is a key or value ifkv
is a keys or values iteration, or(t,k,v)
ifkv
is a pairs iteration.semitokens(kv)
wherekv
is a kv iterable object. Return pairs of the form(st,w)
, wherest
is the token of the item andw
is a key or value ifkv
is a keys or values iteration, or(st,k,v)
ifkv
is a pairs iteration.onlytokens(kv)
wherekv
is a kv iterable object. Return only tokens of the data items but not the items themselves. Thekeys
,values
, orpairs
modifiers described above are ignored.onlysemitokens(kv)
wherekv
is a kv iterable object. Return only semitokens of the data items but not the items themselves. Thekeys
,values
, orpairs
modifiers described above are ignored.
Finally, a tkv iteration can be reversed by the Iterators.reverse
function. The Iterators.reverse
function may be nested in an arbitrary position with respect to the other operations described above. Two reverse operations cancel each other out. For example, Iterators.reverse(keys(Iterators.reverse(m)))
is the same iteration as keys(m)
.
For compatibility with Base
, there is also an eachindex
function: eachindex(b)
where the base object b
a SortedDict is the same as keys(b)
(to be compatible with Dict). On the other hand, eachindex(b)
where the base object b
is a SortedSet or SortedMultiDict is the same as onlysemitokens(b)
.
Colon notation a:b
is equivalent to onlytokens(inclusive(a[1], a[2], b[2]))
, in other words, it yields an iteration that provides all the tokens of items in the sort order ranging from token a
up to token b
. It is required that a[1]===b[1]
(i.e., a
and b
are tokens for the same container). Exclusive iteration using colon notation is obtained via a : b-1
.
Examples:
for (k,v) in sd
<body>
end
Here, sd
is a SortedDict
or SortedMultiDict
. The variables (k,v)
are set to consecutive key-value pairs. All items in the container are produced in order.
for k in inclusive(ss, st1, st2)
<body>
end
Here, ss
is a SortedSet
, and st1
, and st2
are semitokens indexing ss
. The elements of the set between st1
and st2
inclusive are returned.
for (t,k) in tokens(keys(exclusive_key(sd, key1, key2)))
<body>
end
Here, sd
is a SortedDict
or SortedMultiDict
, key1
and key2
are keys indexing sd
. In this case, t
will be tokens of consecutive items, while k
will be the corresponding keys. The returned keys lie between key1
and key2
excluding key2
.
for (t,k) in Iterators.reverse(tokens(keys(exclusive_key(sd, key1, key2))))
<body>
end
Same as above, except the iteration is in the reverse order.
Writing on the objects returned by values
is not currently supported, e.g., the following map!
statement is not implemented even though the analogous statement is available for Dict
in Base.
s = SortedDict(3=>4)
map!(x -> x*2, values(s))
The workaround is an explicit loop:
s = SortedDict(3=>4)
for t in onlysemitokens(s)
s[t] *= 2
end
Running time for all iterations: O(c(s + log n)), where s is the number of steps from start to end of the iteration.
Base.in
— MethodBase.in(k,m::SortedSet)
Return true
iff element k
is in sorted set m
is a sorted set. Unlike the in
function for Set
, this routine will thrown an error if k
is not convertible to eltype(m)
. Time: O(c log n)
Base.in
— MethodBase.in(p::Pair, sd::SortedDict)
Return true if p
is in sd
. Here, p
is a key=>value pair. Time: O(c log n + d) where d stands for the time to compare two values.
Base.in
— MethodBase.in(p::Pair, smd::SortedMultiDict)
Return true if p
is in smd
. Here, p
is a key=>value pair. In the The time is is O(c log n + dl) where d is the time to compare two values and l stands for the number of entries that have the key of the given pair. (So therefore this call is inefficient if the same key addresses a large number of values, and an alternative should be considered.)
Base.in
— MethodBase.in(x, iter::SortedContainerIterable)
Returns true if x
is in iter
, where iter
refers to any of the iterable objects described under Base.iterate(iter::SortedContainerIterable)
, and x
is of the appropriate type. For all of the iterables except the five listed below, the algorithm used is a linear-time search. For example, the call:
(k=>v) in exclusive(sd, st1, st2)
where sd
is a SortedDict, st1
and st2
are semitokens, k
is a key, and v
is a value, will loop over all entries in the dictionary between the two tokens and a compare for equality using isequal
between the indexed item and k=>v
.
The five exceptions are:
(k=>v) in sd
(k=>v) in smd
k in ss
k in keys(sd)
k in keys(smd)
Here, sd
is a SortedDict, smd
is a SortedMultiDict, and ss
is a SortedSet.
These five invocations of in
use the index structure of the sorted container and test equality based on the order object of the keys rather than isequal
. Therefore, these five are all faster than linear-time looping. To force the use of isequal
test on the keys rather than the order object (thus slowing the execution from logarithmic to linear time), replace the above five constructs with these:
(k=>v) in collect(sd)
(k=>v) in collect(smd)
k in collect(ss)
k in collect(keys(sd))
k in collect(keys(smd))
Misc. Functions
Base.isempty(m::SortedContainer)
Base.empty!(m::SortedContainer)
Base.empty(m::SortedContainer)
Base.length(m::SortedContainer)
Base.eltype(m::SortedContainer)
Base.keytype(m::SortedContainer)
Base.valtype(m::SortedContainer)
Base.eltype(m::SortedContainerIteration)
Base.keytype(m::SortedContainerIteration)
Base.valtype(m::SortedContainerIteration)
These functions from Base
are all applicable to sorted containers with the obvious meaning. The eltype
, keytype
, and valtype
functions may be applied either to the object m
or its type. Note that keytype
and valtype
are applicable only to SortedDict and SortedMultiDict, or to pairs iterations over SortedDict or SortedMultiDict. Time: O(1)
DataStructures.ordtype
— Methodordtype(sc::SortedSet)
ordtype(sc::SortedDict)
ordtype(sc::SortedMultiDict)
Return the order type for a sorted container. This function may also be applied to the type itself. Time: O(1)
DataStructures.orderobject
— Methodorderobject(sc::SortedContainer)
Return the order object used to construct the container. Time: O(1)
Base.haskey
— Methodhaskey(sc::SortedContainer, k)
Return true
iff key k
is present in sc
. Equivalent to in(k,sc)
for a SortedSet, or to in(k,keys(sc))
for a SortedDict or SortedMultiDict. Time: O(c log n)
Base.get
— MethodBase.get(sd::SortedDict,k,default)
Base.get(default_f::Union{Function,Type}, sd::SortedDict, k)
Return the value associated with key k
where sd
is a SortedDict, or else returns default
if k
is not in sd
. The second form obtains default
as the return argument of the function/type-constructor default_f
(with no arguments) when the key is not present. Time: O(c log n)
Base.get!
— MethodBase.get!(sd::SortedDict,k,default)
Base.get!(default_f::Union{Function,Type}, sd::SortedDict, k)
Return the value associated with key k
where sd
is a SortedDict, or else return default
if k
is not in sd
, and in the latter case, inserts (k,default)
into sd
. The second form computes a default value by calling the function default_f
(with no arguments) or the constructor of type default_f
when the key is not present. Time: O(c log n)
Base.getkey
— MethodBase.getkey(sd::SortedDict,k,defaultk)
Return the key k
where sd
is a SortedDict, if k
is in sd
else it returns defaultk
. If the container uses in its ordering an eq
method different from isequal (e.g., case-insensitive ASCII strings illustrated below), then the return value is the actual key stored in the SortedDict that is equivalent to k
according to the eq
method, which might not be equal to k
. Similarly, if the user performs an implicit conversion as part of the call (e.g., the container has keys that are floats, but the k
argument to getkey
is an Int), then the returned key is the actual stored key rather than k
. Time: O(c log n)
Base.isequal
— MethodBase.isequal(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K,Ord <: Ordering}
Base.issetequal(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K,Ord <: Ordering}
Check if two sorted sets are equal in the sense that they contain the same items. Note that isequal
in this sense does not imply correspondence between semitokens for items in sc1
with those for sc2
. Time: O(cn) where n is the size of the smaller container. If the two sorted sets have K
, different Ord
, or different order objects, then a fallback routine isequal(::AbstractSet, ::AbstractSet)
is invoked.
Base.isequal
— MethodBase.isequal(sd1::SortedDict{K,V,Ord}, sd2::SortedDict{K,V,Ord}) where {K, V, Ord <: Ordering}
Check if two SortedDicts are equal in the sense that they contain the same items; the keys are compared using the eq
method, while the values are compared with the isequal
function. Note that isequal
in this sense does not imply correspondence between semitokens for items in sd1
with those for sd2
. Time: O(cn). Note that if K
, V
, Ord
, or the order objects of sd1 and sd2 are different, then a fallback routine Base.isequal(::AbstractDict, ::AbstractDict)
is invoked. Time: O(cn)
Base.isequal
— MethodBase.isequal(smd1::SortedMultiDict{K,V,Ord}, smd2::SortedMultiDict{K,V,Ord}) where {K, V, Ord <: Ordering}
Check if two SortedMultiDicts are equal in the sense that they contain the same items in the same order (that is, the same insertion order). They must have the same order object, else they compare unequal. The keys are compared using the eq
method, while the values are compared with the isequal
function. Note that isequal
in this sense does not imply any correspondence between semitokens for items in smd1
with those for smd2
. Time: O(cn)
DataStructures.packcopy
— Methodcopy(sc::SortedSet)
copy(sc::SortedDict)
copy(sc::SortedMultiDict)
packcopy(sc::SortedSet)
packcopy(sc::SortedDict)
packcopy(sc::SortedMultiDict)
Return a copy of sc
, where sc
is a sorted container, in which the data is packed. When deletions take place, the previously allocated memory is not returned. This function can be used to reclaim memory after many deletions. Time: O(cn)
Note that the semitokens valid for the original container are no longer valid for the copy because the indexing structure is rebuilt by these copies. If an exact copy is needed in which semitokens remain valid, use Base.deepcopy
.
DataStructures.packdeepcopy
— Methodpackdeepcopy(sc::SortedSet)
packdeepcopy(sc::SortedDict)
packdeepcopy(sc::SorteMultiDict)
Return a packed copy of sc
, where sc
is a sorted container in which the keys and values are deep-copied. This function can be used to reclaim memory after many deletions. Time: O(cn)
Base.merge
— MethodBase.merge(sd::SortedDict{K,V,Ord}, d1::AbstractDict{K,V}...) where {K,V,Ord <: Ordering}
Merge one or more dicts into a single SortedDict and return the new SortedDict. Arguments d1
etc. must have the same key-value type as sd
. In the case of keys duplicated among the arguments, the rightmost argument that owns the key gets its value stored. Time: O(cN log N), where N is the total size of all the arguments. If all the arguments are SortedDicts with the same key, value, and order object, then the time is O(cN).
Base.merge!
— MethodBase.merge!(sd::SortedDict{K,V,Ord}, d1::AbstractDict{K,V}...) where {K,V,Ord<:Ordering}
Merge one or more dicts d1
, etc. into sd
. These must all must have the same key-value types. In the case of keys duplicated among the arguments, the rightmost argument that owns the key gets its value stored. Time: O(cN log N), where N is the total size of all the arguments.
Base.merge
— MethodBase.merge(smd::SortedMultiDict, iter...)
Merge smd
and one or more iterables and return the resulting new SortedMultiDict. The iterables must have the same key-value type as smd
. Items with equal keys are stored with left-to-right ordering. Time: O(cN log N), where N is the total size of all the arguments. If all the arguments are SortedMultiDicts with the same key, value, and order object, then the time is O(cN).
Base.merge!
— MethodBase.merge!(smd::SortedMultiDict, iter...)
Merge one or more iterables iter
, etc. into smd
. These must all must have the same key-value types. Items with equal keys are stored with left-to-right ordering. Time: O(cN log N), where N is the total size of all the arguments.
Set operations
The SortedSet container supports the following set operations. Note that in the case of intersect
, symdiff
and setdiff
, the two SortedSets should have the same key and ordering object. If they have different key or ordering types, no error message is produced; instead, the built-in default versions of these functions (that can be applied to Any
iterables and that return arrays) are invoked.
Base.union!
— MethodBase.union!(ss::SortedSet, iterable...)
Insert each item among the second and following arguments (which must be iterable) into the SortedSet ss
. The items must be convertible to the key-type of ss
. Time: O(cN log N) where N is the total number of items in the iterable arguments.
Base.union
— MethodBase.union(ss::SortedSet, iterable...)
Compute and return the union of a sorted set and one or more iterables. They must have the same keytype. If they are all sorted sets with the same order object, then the required time is O(cn), where n is the total size. If not, then the fallback routine requires time O(cn log n).
Base.intersect
— MethodBase.intersect(ss::SortedSet, others...)
Intersect SortedSets with other SortedSets or other iterables and return the intersection as a new SortedSet. Time: O(cn), where n is the total number of items in all the arguments if all the arguments are SortedSets of the same type and same order object. Otherwise, the time is O(cn log n)
Base.symdiff
— MethodBase.symdiff(ss1::SortedSet, iterable)
Compute and return the symmetric difference of ss1
and iterable
, i.e., a sorted set containing entries that are in one of ss1
or iterable
but not both. Time: O(cn), where n is the total size of the two containers if both are sorted sets with the same key and order objects. Otherwise, the time is O(cn log n)
Base.setdiff
— MethodBase.setdiff(ss1::SortedSet{K,Ord}, ss2::SortedSet{K,Ord}) where {K, Ord<:Ordering}
Base.setdiff(ss1::SortedSet, others...)
Return the set difference, i.e., a sorted set containing entries in ss1
but not in ss2
or successive arguments. Time for the first form: O(cn) where n is the total size of both sets provided that they are both sorted sets of the same type and order object. The second form computes the set difference between ss1
and all the others, which are all iterables. The second form requires O(cn log n) time.
Base.setdiff!
— MethodBase.setdiff!(ss::SortedSet, iterable..)
Delete items in ss
that appear in any of the iterables. The arguments after the first must be iterables each of whose entries must convertible to the key type of m1. Time: O(cm log n), where n is the size of ss
and m is the total number of items in iterable.
Base.issubset
— MethodBase.issubset(iterable, ss::SortedSet)
Check whether each item of the first argument is an element of ss
. The entries must be convertible to the key-type of ss
. Time: O(cm log n), where n is the size of ss
and m is the number of items in iterable
. If both are sorted sets of the same keytype and order object and if m > n / log n, then an algorithm whose running time is O(c(m+n)) is used.
Ordering of keys
As mentioned earlier, the default ordering of keys uses isless
and isequal
functions. If the default ordering is used, it is a requirement of the container that isequal(a,b)
is true if and only if !isless(a,b)
and !isless(b,a)
are both true. This relationship between isequal
and isless
holds for common built-in types, but it may not hold for all types, especially user-defined types. If it does not hold for a certain type, then a custom ordering argument must be defined as discussed in the next few paragraphs.
The name for the default ordering (i.e., using isless
and isequal
) is Forward
. Note: this is the name of the ordering object; its type is ForwardOrdering.
Another possible ordering object is Reverse
, which reverses the usual sorted order. This name must be imported import Base.Reverse
if it is used.
As an example of a custom ordering, suppose the keys are of type String
, and the user wishes to order the keys ignoring case: APPLE, berry and Cherry would appear in that order, and APPLE and aPPlE would be indistinguishable in this ordering.
The simplest approach is to define an ordering object of the form Lt(my_isless)
, where Lt
is a built-in type (see ordering.jl
) and my_isless
is the user's comparison function. In the above example, the ordering object would be:
Lt((x,y) -> isless(lowercase(x),lowercase(y)))
The ordering object is indicated in the above list of constructors in the o
position (see above for constructor syntax).
This approach may suffer from a performance hit because higher performance may be possible if an equality method is also available as well as a less-than method. A more complicated but higher-performance method to implement a custom ordering is as follows. First, the user creates a singleton type that is a subtype of Ordering
as follows:
struct CaseInsensitive <: Ordering
end
Next, the user defines a method named lt
for less-than in this ordering:
lt(::CaseInsensitive, a, b) = isless(lowercase(a), lowercase(b))
The first argument to lt
is an object of the CaseInsensitive
type (there is only one such object since it is a singleton type). The container also needs an equal-to function; the default is:
eq(o::Ordering, a, b) = !lt(o, a, b) && !lt(o, b, a)
The user can also customize this function with a more efficient implementation. In the above example, an appropriate customization would be:
eq(::CaseInsensitive, a, b) = isequal(lowercase(a), lowercase(b))
Note: the user-defined eq
and lt
functions must be compatible in the sense that !lt(o, a, b) && !lt(o, b, a)
if and only if eq(o, a, b)
.
Finally, the user specifies the unique element of CaseInsensitive
, namely the object CaseInsensitive()
, as the ordering object to the SortedDict
, SortedMultiDict
or SortedSet
constructor.
For the above code to work, the module must make the following declarations, typically near the beginning:
import Base.Ordering
import Base.lt
import DataStructures.eq
Cautionary note on mutable keys
As with ordinary Dicts, keys for the sorted containers can be either mutable or immutable. In the case of mutable keys, it is important that the keys not be mutated once they are in the container else the indexing structure will be corrupted. (The same restriction applies to Dict.) For example, the following sequence of statements leaves sd
in a corrupted state:
sd = SortedDict{Vector{Int},Int}()
k = [1,2,3]
sd[k] = 19
sd[[6,4]] = 12
k[1] = 7
Performance of Sorted Containers
The sorted containers are currently not optimized for cache performance.