Module tantivy::termdict
[−]
[src]
The term dictionary is one of the key datastructure of
tantivy. It associates sorted terms
to a TermInfo
struct
that serves as an address in their respective posting list.
The term dictionary API makes it possible to iterate through a range of keys in a sorted manner.
Implementations
There is currently two implementations of the term dictionary.
Default implementation : fstdict
The default one relies heavily on the fst
crate.
It associate each terms &[u8]
representation to a u64
that is in fact an address in a buffer. The value is then accessible
via deserializing the value at this address.
Stream implementation : streamdict
The fstdict
is a tiny bit slow when streaming all of
the terms.
For some use case (analytics engine), it is preferrable
to use the streamdict
, that offers better streaming
performance, to the detriment of lookup
performance.
streamdict
can be enabled by adding the streamdict
feature when compiling tantivy
.
streamdict
encodes each term relatively to the precedent
as follows.
- number of bytes that needs to be popped.
- number of bytes that needs to be added.
- sequence of bytes that is to be added
- value.
Because such a structure does not allow for lookups,
it comes with a fst
that indexes 1 out of 1024
terms in this structure.
A lookup
therefore consists in a lookup in the fst
followed by a streaming through at most 1024
elements in the
term stream
.
Structs
TermDictionaryBuilderImpl | |
TermDictionaryImpl |
See |
TermMerger |
Given a list of sorted term streams, returns an iterator over sorted unique terms. |
TermStreamerBuilderImpl | |
TermStreamerImpl |
See |
Traits
TermDictionary |
Dictionary associating sorted |
TermDictionaryBuilder |
Builder for the new term dictionary. |
TermStreamer |
|
TermStreamerBuilder |
|
Type Definitions
TermOrdinal |
Position of the term in the sorted list of terms. |