Struct regex_automata::dfa::regex::Builder

source ·

pub struct Builder {}

Expand description

A builder for a regex based on deterministic finite automatons.

This builder permits configuring options for the syntax of a pattern, the NFA construction, the DFA construction and finally the regex searching itself. This builder is different from a general purpose regex builder in that it permits fine grain configuration of the construction process. The trade off for this is complexity, and the possibility of setting a configuration that might not make sense. For example, there are two different UTF-8 modes:

syntax::Config::utf8 controls whether the pattern itself can contain sub-expressions that match invalid UTF-8.
thompson::Config::utf8 controls how the regex iterators themselves advance the starting position of the next search when a match with zero length is found.

Generally speaking, callers will want to either enable all of these or disable all of these.

Internally, building a regex requires building two DFAs, where one is responsible for finding the end of a match and the other is responsible for finding the start of a match. If you only need to detect whether something matched, or only the end of a match, then you should use a [dense::Builder] to construct a single DFA, which is cheaper than building two DFAs.

Build methods

This builder has a few “build” methods. In general, it’s the result of combining the following parameters:

Building one or many regexes.
Building a regex with dense or sparse DFAs.

The simplest “build” method is [Builder::build]. It accepts a single pattern and builds a dense DFA using usize for the state identifier representation.

The most general “build” method is [Builder::build_many], which permits building a regex that searches for multiple patterns simultaneously while using a specific state identifier representation.

The most flexible “build” method, but hardest to use, is Builder::build_from_dfas. This exposes the fact that a Regex is just a pair of DFAs, and this method allows you to specify those DFAs exactly.

Example

This example shows how to disable UTF-8 mode in the syntax and the regex itself. This is generally what you want for matching on arbitrary bytes.

use regex_automata::{
    dfa::regex::Regex, nfa::thompson, util::syntax, Match,
};

let re = Regex::builder()
    .syntax(syntax::Config::new().utf8(false))
    .thompson(thompson::Config::new().utf8(false))
    .build(r"foo(?-u:[^b])ar.*")?;
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 1..9));
let got = re.find(haystack);
assert_eq!(expected, got);
// Notice that `(?-u:[^b])` matches invalid UTF-8,
// but the subsequent `.*` does not! Disabling UTF-8
// on the syntax permits this.
assert_eq!(b"foo\xFFarzz", &haystack[got.unwrap().range()]);

Struct regex_automata::dfa::regex::Builder

Implementations§

impl Builder

pub fn new() -> Builder

pub fn build_from_dfas<A: Automaton>(&self, forward: A, reverse: A) -> Regex<A>

Trait Implementations§

impl Clone for Builder

fn clone(&self) -> Builder

fn clone_from(&mut self, source: &Self)

impl Debug for Builder

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for Builder

fn default() -> Builder

Auto Trait Implementations§

impl RefUnwindSafe for Builder

impl Send for Builder

impl Sync for Builder

impl Unpin for Builder

impl UnwindSafe for Builder

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,