Trait tantivy::tokenizer::Tokenizer
[−]
[src]
pub trait Tokenizer<'a>: Sized + Clone { type TokenStreamImpl: TokenStream; fn token_stream(&self, text: &'a str) -> Self::TokenStreamImpl; fn filter<NewFilter>(
self,
new_filter: NewFilter
) -> ChainTokenizer<NewFilter, Self>
where
NewFilter: TokenFilter<Self::TokenStreamImpl>, { ... } }
Tokenizer
are in charge of splitting text into a stream of token
before indexing.
See the module documentation for more detail.
Warning
This API may change to use associated types.
Associated Types
type TokenStreamImpl: TokenStream
Type associated to the resulting tokenstream tokenstream.
Required Methods
fn token_stream(&self, text: &'a str) -> Self::TokenStreamImpl
Creates a token stream for a given str
.
Provided Methods
fn filter<NewFilter>(
self,
new_filter: NewFilter
) -> ChainTokenizer<NewFilter, Self> where
NewFilter: TokenFilter<Self::TokenStreamImpl>,
self,
new_filter: NewFilter
) -> ChainTokenizer<NewFilter, Self> where
NewFilter: TokenFilter<Self::TokenStreamImpl>,
Appends a token filter to the current tokenizer.
The method consumes the current TokenStream
and returns a
new one.
Example
use tantivy::tokenizer::*; let en_stem = SimpleTokenizer .filter(RemoveLongFilter::limit(40)) .filter(LowerCaser) .filter(Stemmer::new());
Implementors
impl<'a> Tokenizer<'a> for SimpleTokenizer type TokenStreamImpl = SimpleTokenStream<'a>;
impl<'a> Tokenizer<'a> for FacetTokenizer type TokenStreamImpl = FacetTokenStream<'a>;
impl<'a> Tokenizer<'a> for JapaneseTokenizer type TokenStreamImpl = JapaneseTokenizerStream;
impl<'a> Tokenizer<'a> for RawTokenizer type TokenStreamImpl = RawTokenStream;