[Bug 868578] New: Review Request: re2 - C++ fast alternative to backtracking RE engines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.redhat.com/show_bug.cgi?id=868578

            Bug ID: 868578
        QA Contact: extras-qa@xxxxxxxxxxxxxxxxx
          Severity: medium
           Version: rawhide
          Priority: medium
                CC: notting@xxxxxxxxxx,
                    package-review@xxxxxxxxxxxxxxxxxxxxxxx
          Assignee: nobody@xxxxxxxxxxxxxxxxx
           Summary: Review Request: re2 - C++ fast alternative to
                    backtracking RE engines
        Regression: ---
      Story Points: ---
    Classification: Fedora
                OS: Linux
          Reporter: denis.arnaud_fedora@xxxxxxx
              Type: ---
     Documentation: ---
          Hardware: All
        Mount Type: ---
            Status: NEW
         Component: Package Review
           Product: Fedora

Spec URL: http://denisarnaud.fedorapeople.org/re2/re2.spec
SRPM URL: http://denisarnaud.fedorapeople.org/re2/re2-0.0.0-1.fc17.src.rpm
Fedora Account System Username: denisarnaud
Description:
RE2 is a fast, safe, thread-friendly alternative to backtracking
regular expression engines like those used in PCRE, Perl, and
Python. It is a C++ library.

Backtracking engines are typically full of features and convenient
syntactic sugar but can be forced into taking exponential amounts of
time on even small inputs. RE2 uses automata theory to guarantee that
regular expression searches run in time linear in the size of the
input. RE2 implements memory limits, so that searches can be
constrained to a fixed amount of memory. RE2 is engineered to use a
small fixed C++ stack footprint no matter what inputs or regular
expressions it must process; thus RE2 is useful in multithreaded
environments where thread stacks cannot grow arbitrarily large.

On large inputs, RE2 is often much faster than backtracking engines;
its use of automata theory lets it apply optimizations that the others
cannot.

Unlike most automata-based engines, RE2 implements almost all the
common Perl and PCRE features and syntactic sugars. It also finds the
leftmost-first match, the same match that Perl would, and can return
submatch information. The one significant exception is that RE2 drops
support for backreferences¹ and generalized zero-width assertions,
because they cannot be implemented efficiently. The syntax page gives
full details.

For those who want a simpler syntax, RE2 has a POSIX mode that accepts
only the POSIX egrep operators and implements leftmost-longest overall
matching.

To get started writing programs that use RE2, see the C++ API
description. For details about the implementation, see "Regular
Expression Matching in the Wild."

Technical note: there's a difference between submatches and
backreferences. Submatches let you find out what certain
subexpressions matched after the match is over, so that you can find
out, after matching dogcat against (cat|dog)(cat|dog), that \1 is dog
and \2 is cat. Backreferences let you use those subexpressions during
the match, so that (cat|dog)\1 matches catcat and dogdog but not
catdog or dogcat.

RE2 supports submatch extraction, but not backreferences.

If you absolutely need backreferences and generalized assertions, then
RE2 is not for you, but you might be interested in irregexp, Google
Chrome's regular expression engine.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
package-review mailing list
package-review@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/package-review



[Index of Archives]     [Fedora Legacy]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]