[−][src]Struct utf8_ranges::Utf8Sequences
An iterator over ranges of matching UTF-8 byte sequences.
The iteration represents an alternation of comprehensive byte sequences that match precisely the set of UTF-8 encoded scalar values.
A byte sequence corresponds to one of the scalar values in the range given if and only if it completely matches exactly one of the sequences of byte ranges produced by this iterator.
Each sequence of byte ranges matches a unique set of bytes. That is, no two sequences will match the same bytes.
Example
This shows how to match an arbitrary byte sequence against a range of scalar values.
use utf8_ranges::{Utf8Sequences, Utf8Sequence}; fn matches(seqs: &[Utf8Sequence], bytes: &[u8]) -> bool { for range in seqs { if range.matches(bytes) { return true; } } false } // Test the basic multilingual plane. let seqs: Vec<_> = Utf8Sequences::new('\u{0}', '\u{FFFF}').collect(); // UTF-8 encoding of 'a'. assert!(matches(&seqs, &[0x61])); // UTF-8 encoding of '☃' (`\u{2603}`). assert!(matches(&seqs, &[0xE2, 0x98, 0x83])); // UTF-8 encoding of `\u{10348}` (outside the BMP). assert!(!matches(&seqs, &[0xF0, 0x90, 0x8D, 0x88])); // Tries to match against a UTF-8 encoding of a surrogate codepoint, // which is invalid UTF-8, and therefore fails, despite the fact that // the corresponding codepoint (0xD800) falls in the range given. assert!(!matches(&seqs, &[0xED, 0xA0, 0x80])); // And fails against plain old invalid UTF-8. assert!(!matches(&seqs, &[0xFF, 0xFF]));
If this example seems circuitous, that's because it is! It's meant to be illustrative. In practice, you could just try to decode your byte sequence and compare it with the scalar value range directly. However, this is not always possible (for example, in a byte based automaton).
Methods
impl Utf8Sequences
[src][−]
pub fn new(start: char, end: char) -> Self
[src][−]
Create a new iterator over UTF-8 byte ranges for the scalar value range given.
Trait Implementations
impl Iterator for Utf8Sequences
[src][+]
Auto Trait Implementations
impl Send for Utf8Sequences
impl Sync for Utf8Sequences
Blanket Implementations
impl<I> IntoIterator for I where
I: Iterator,
[src][−]
I: Iterator,
type Item = <I as Iterator>::Item
The type of the elements being iterated over.
type IntoIter = I
Which kind of iterator are we turning this into?
fn into_iter(self) -> I
[src][−]
impl<T> From for T
[src][−]
impl<T, U> Into for T where
U: From<T>,
[src][−]
U: From<T>,
impl<T, U> TryFrom for T where
T: From<U>,
[src][−]
T: From<U>,
type Error = !
try_from
)The type returned in the event of a conversion error.
fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>
[src][−]
impl<T> Borrow for T where
T: ?Sized,
[src][−]
T: ?Sized,
impl<T> BorrowMut for T where
T: ?Sized,
[src][−]
T: ?Sized,
fn borrow_mut(&mut self) -> &mut T
[src][−]
impl<T, U> TryInto for T where
U: TryFrom<T>,
[src][−]
U: TryFrom<T>,
type Error = <U as TryFrom<T>>::Error
try_from
)The type returned in the event of a conversion error.
fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>
[src][−]
impl<T> Any for T where
T: 'static + ?Sized,
[src][−]
T: 'static + ?Sized,