## sortedvec

published on 2019-01-21Vectors are the most widely used data structures in the Rust programming language, and for good reason too: these array types are both simple and perform well at many tasks. Its elements are stored next to each other in memory — we call such containers contiguous — and that makes their representation in memory compact and iteration quick (due to cache locality). Further, it is possible to index items without traversal, providing constant access times.

Although vectors have many advantages, they aren't ideal for every task. One such operation is search. Without additional structure, the best way to find an element in a vector that satisfies a certain condition (also called predicates) is to simply try every single element until you either find one or hit the end of the vector. This is called linear search, because the time it takes to look up an element scales linearly with the number of elements in the collection.

The sortedvec crate for Rust tries to improve these lookup times, while keeping many of the advantages of vectors.
It introduces a way to define contiguous data structures that are fairly simple wrapper type around the standard library's
`Vec<T>`

type, with the added property that the internal data is sorted with respect to some derived key.

Storing the data in a sorted state allows the lookup of values by key using binary search. For all
but the smallest arrays, binary search is much faster than linear search since it only requires a number of comparisons that is logarithmic in the
number of elements in the
structure. For example, in an array of a million values, linear search will on average require 500,000 comparisons. Even in the worst case, binary search will
never require more than 20 comparisons. Though this is good, it's not quite as fast as the specialized `HashMap`

data structure, which achieves a near flat scaling of lookups! Sorted vectors are perfect for situations where fast key based lookups are required, but the memory
and complexity overhead of hashmaps are not warranted.

Rust's own `Vec`

provides methods to look up values using binary search by key already,
but that assumes that you provide a vector that is already sorted with respect to that key. In fact, sortedvec uses these methods internally. Its added value is that
it maintains that invariant, so lookups are guaranteed to be quick and correct. To achieve this, the crate leverages Rust's unique type system. The sortedvec types
provide immutable references to the underlying `Vec`

, so you can still use all the immutable methods you are used to for vectors and slices. However, you won't be able
to get a mutable reference, because that could invalidate the data's order invariant.

Here's an overview of how lookups scale on `HashMap`

, `SortedVec`

and `Vec`

when using string keys:

Note that this does not take into account the upfront costs of constructing the data structures, which is a bit more expensive for sorted vectors compared to regular vectors, since it'll require a sort.

## crate details

The sortedvec crate only exposes a single macro, `sortedvec!`

, which can be used to define a data type like so

sortedvec! { pub struct SortedVec { fn key_deriv(x: &T) -> K { x.as_k() } } }

where `T`

is the type of the value stored, `K`

the type of its derived key and `key_deriv`

the function that constructs a key `K`

from a value reference `&T`

. You
can think of the resulting struct `SortedVec`

as a map similar to `HashMap<K, T>`

. However, unlike the `HashMap`

, the defined struct does not
have any type parameters.

The initial versions of the crate actually defined a single parametric struct `SortedVec<T, F>`

, where `F: for<'t> FnMut(&'t T) -> &'t K`

was the key derivation
function. It worked, but since Rust doesn't have any way to actually write the type of a function, all functions and data structures dealing with it would also have to
be parametric in this function type `F`

: an ergonomics nightmare.

In version 0.4 of sortedvec, defining and using a collection may look something like this:

#[derive(Debug, Default)] struct SomeComplexValue { some_map: HashMap<String, PathBuf>, name: String, prio: u64, } sortedvec! { /// Vec of `SomeComplexValues` that allows quick /// lookup by (name, prio) keys #[derive(Debug)] struct ComplexMap { fn key_deriv(val: &SomeComplexValue) -> (&str, u64) { (val.name.as_str(), val.prio) } } } let mut sv = ComplexMap::default(); sv.insert(SomeComplexValue { some_map: Default::default(), name: "test".to_owned(), prio: 0, }); assert!(sv.find(&("hello", 1)).is_none()); assert!(sv.find(&("test", 0)).is_some()); for val in sv { println!("{:?}", val); }

The crate is at the time of writing still in an experimental and unstable state, but all are encouraged to try it out. If you find any issues, bugs or missing use cases, please let us know on the GitHub page.