Struct regex_automata::dfa::regex::Regex
source · pub struct Regex<A = DFA<Vec<u32>>> { /* private fields */ }
Expand description
A regular expression that uses deterministic finite automata for fast searching.
A regular expression is comprised of two DFAs, a “forward” DFA and a “reverse” DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.
The type of the DFA used by a Regex
corresponds to the A
type
parameter, which must satisfy the Automaton
trait. Typically,
A
is either a dense::DFA
or a
sparse::DFA
, where dense DFAs use more
memory but search faster, while sparse DFAs use less memory but search
more slowly.
Crate features
Note that despite what the documentation auto-generates, the only
crate feature needed to use this type is dfa-search
. You do not
need to enable the alloc
feature.
By default, a regex’s automaton type parameter is set to
dense::DFA<Vec<u32>>
when the alloc
feature is enabled. For most
in-memory work loads, this is the most convenient type that gives the
best search performance. When the alloc
feature is disabled, no
default type is used.
When should I use this?
Generally speaking, if you can afford the overhead of building a full
DFA for your regex, and you don’t need things like capturing groups,
then this is a good choice if you’re looking to optimize for matching
speed. Note however that its speed may be worse than a general purpose
regex engine if you don’t provide a [dense::Config::prefilter
] to the
underlying DFA.
Sparse DFAs
Since a Regex
is generic over the Automaton
trait, it can be
used with any kind of DFA. While this crate constructs dense DFAs by
default, it is easy enough to build corresponding sparse DFAs, and then
build a regex from them:
use regex_automata::dfa::regex::Regex;
// First, build a regex that uses dense DFAs.
let dense_re = Regex::new("foo[0-9]+")?;
// Second, build sparse DFAs from the forward and reverse dense DFAs.
let fwd = dense_re.forward().to_sparse()?;
let rev = dense_re.reverse().to_sparse()?;
// Third, build a new regex from the constituent sparse DFAs.
let sparse_re = Regex::builder().build_from_dfas(fwd, rev);
// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert_eq!(true, sparse_re.is_match(b"foo123"));
Alternatively, one can use a Builder
to construct a sparse DFA
more succinctly. (Note though that dense DFAs are still constructed
first internally, and then converted to sparse DFAs, as in the example
above.)
use regex_automata::dfa::regex::Regex;
let sparse_re = Regex::builder().build_sparse(r"foo[0-9]+")?;
// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert!(sparse_re.is_match(b"foo123"));
Fallibility
Most of the search routines defined on this type will panic when the
underlying search fails. This might be because the DFA gave up because
it saw a quit byte, whether configured explicitly or via heuristic
Unicode word boundary support, although neither are enabled by default.
Or it might fail because an invalid Input
configuration is given,
for example, with an unsupported Anchored
mode.
If you need to handle these error cases instead of allowing them to
trigger a panic, then the lower level Regex::try_search
provides
a fallible API that never panics.
Example
This example shows how to cause a search to terminate if it sees a
\n
byte, and handle the error returned. This could be useful if, for
example, you wanted to prevent a user supplied pattern from matching
across a line boundary.
use regex_automata::{dfa::{self, regex::Regex}, Input, MatchError};
let re = Regex::builder()
.dense(dfa::dense::Config::new().quit(b'\n', true))
.build(r"foo\p{any}+bar")?;
let input = Input::new("foo\nbar");
// Normally this would produce a match, since \p{any} contains '\n'.
// But since we instructed the automaton to enter a quit state if a
// '\n' is observed, this produces a match error instead.
let expected = MatchError::quit(b'\n', 3);
let got = re.try_search(&input).unwrap_err();
assert_eq!(expected, got);
Implementations§
source§impl Regex<DFA<&'static [u32]>>
impl Regex<DFA<&'static [u32]>>
Convenience routines for regex construction.
sourcepub fn builder() -> Builder
pub fn builder() -> Builder
Return a builder for configuring the construction of a Regex
.
This is a convenience routine to avoid needing to import the
Builder
type in common cases.
Example
This example shows how to use the builder to disable UTF-8 mode everywhere.
use regex_automata::{
dfa::regex::Regex, nfa::thompson, util::syntax, Match,
};
let re = Regex::builder()
.syntax(syntax::Config::new().utf8(false))
.thompson(thompson::Config::new().utf8(false))
.build(r"foo(?-u:[^b])ar.*")?;
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 1..9));
let got = re.find(haystack);
assert_eq!(expected, got);
source§impl<A: Automaton> Regex<A>
impl<A: Automaton> Regex<A>
Standard search routines for finding and iterating over matches.
sourcepub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
pub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
Returns true if and only if this regex matches the given haystack.
This routine may short circuit if it knows that scanning future input
will never lead to a different result. In particular, if the underlying
DFA enters a match state or a dead state, then this routine will return
true
or false
, respectively, without inspecting any future input.
Panics
This routine panics if the search could not complete. This can occur in a number of circumstances:
- The configuration of the DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the DFA quitting.
- When the provided
Input
configuration is not supported. For example, by providing an unsupported anchor mode.
When a search panics, callers cannot know whether a match exists or not.
Use Regex::try_search
if you want to handle these error conditions.
Example
use regex_automata::dfa::regex::Regex;
let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(true, re.is_match("foo12345bar"));
assert_eq!(false, re.is_match("foobar"));
sourcepub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
pub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
Returns the start and end offset of the leftmost match. If no match
exists, then None
is returned.
Panics
This routine panics if the search could not complete. This can occur in a number of circumstances:
- The configuration of the DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the DFA quitting.
- When the provided
Input
configuration is not supported. For example, by providing an unsupported anchor mode.
When a search panics, callers cannot know whether a match exists or not.
Use Regex::try_search
if you want to handle these error conditions.
Example
use regex_automata::{Match, dfa::regex::Regex};
// Greediness is applied appropriately.
let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some(Match::must(0, 3..11)), re.find("zzzfoo12345zzz"));
// Even though a match is found after reading the first byte (`a`),
// the default leftmost-first match semantics demand that we find the
// earliest match that prefers earlier parts of the pattern over latter
// parts.
let re = Regex::new("abc|a")?;
assert_eq!(Some(Match::must(0, 0..3)), re.find("abc"));
sourcepub fn find_iter<'r, 'h, I: Into<Input<'h>>>(
&'r self,
input: I
) -> FindMatches<'r, 'h, A> ⓘ
pub fn find_iter<'r, 'h, I: Into<Input<'h>>>( &'r self, input: I ) -> FindMatches<'r, 'h, A> ⓘ
Returns an iterator over all non-overlapping leftmost matches in the given bytes. If no match exists, then the iterator yields no elements.
This corresponds to the “standard” regex search iterator.
Panics
If the search returns an error during iteration, then iteration
panics. See Regex::find
for the panic conditions.
Use Regex::try_search
with
util::iter::Searcher
if you want to
handle these error conditions.
Example
use regex_automata::{Match, dfa::regex::Regex};
let re = Regex::new("foo[0-9]+")?;
let text = "foo1 foo12 foo123";
let matches: Vec<Match> = re.find_iter(text).collect();
assert_eq!(matches, vec![
Match::must(0, 0..4),
Match::must(0, 5..10),
Match::must(0, 11..17),
]);
source§impl<A: Automaton> Regex<A>
impl<A: Automaton> Regex<A>
Lower level fallible search routines that permit controlling where the search starts and ends in a particular sequence.
sourcepub fn try_search(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>
pub fn try_search(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>
Returns the start and end offset of the leftmost match. If no match
exists, then None
is returned.
This is like Regex::find
but with two differences:
- It is not generic over
Into<Input>
and instead accepts a&Input
. This permits reusing the sameInput
for multiple searches without needing to create a new one. This may help with latency. - It returns an error if the search could not complete where as
Regex::find
will panic.
Errors
This routine errors if the search could not complete. This can occur in the following circumstances:
- The configuration of the DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the DFA quitting.
- When the provided
Input
configuration is not supported. For example, by providing an unsupported anchor mode.
When a search returns an error, callers cannot know whether a match exists or not.
source§impl<A: Automaton> Regex<A>
impl<A: Automaton> Regex<A>
Non-search APIs for querying information about the regex and setting a prefilter.
sourcepub fn forward(&self) -> &A
pub fn forward(&self) -> &A
Return the underlying DFA responsible for forward matching.
This is useful for accessing the underlying DFA and converting it to
some other format or size. See the Builder::build_from_dfas
docs
for an example of where this might be useful.
sourcepub fn reverse(&self) -> &A
pub fn reverse(&self) -> &A
Return the underlying DFA responsible for reverse matching.
This is useful for accessing the underlying DFA and converting it to
some other format or size. See the Builder::build_from_dfas
docs
for an example of where this might be useful.
sourcepub fn pattern_len(&self) -> usize
pub fn pattern_len(&self) -> usize
Returns the total number of patterns matched by this regex.
Example
use regex_automata::dfa::regex::Regex;
let re = Regex::new_many(&[r"[a-z]+", r"[0-9]+", r"\w+"])?;
assert_eq!(3, re.pattern_len());