[Bug 978233] New: perl-5.18: Regex \8 and \9 after literals no longer work

bugzilla@xxxxxxxxxx · Wed, 26 Jun 2013 07:01:31 +0000

https://bugzilla.redhat.com/show_bug.cgi?id=978233

            Bug ID: 978233
           Summary: perl-5.18: Regex \8 and \9 after literals no longer
                    work
           Product: Fedora
           Version: rawhide
         Component: perl
          Severity: unspecified
          Priority: unspecified
          Assignee: mmaslano@xxxxxxxxxx
          Reporter: ppisar@xxxxxxxxxx
        QA Contact: extras-qa@xxxxxxxxxxxxxxxxx
                CC: cweyl@xxxxxxxxxxxxxxx, iarnell@xxxxxxxxx,
                    jplesnik@xxxxxxxxxx, kasal@xxxxxx, lkundrak@xxxxx,
                    mmaslano@xxxxxxxxxx,
                    perl-devel@xxxxxxxxxxxxxxxxxxxxxxx, ppisar@xxxxxxxxxx,
                    psabata@xxxxxxxxxx, rc040203@xxxxxxxxxx,
                    tcallawa@xxxxxxxxxx

There is a regression about \8 and \9 back-references not working since
v5.17.0-543-g726ee55. This has been somewhat fixed with:

commit f1e1b256c5c1773d90e828cca6323c53fa23391b
Author: Yves Orton <demerphq@xxxxxxxxx>
Date:   Tue Jun 25 21:01:27 2013 +0200

    Fix rules for parsing numeric escapes in regexes

    Commit 726ee55d introduced better handling of things like \87 in a
    regex, but as an unfortunate side effect broke latex2html.

    The rules for handling backslashes in regexen are a bit arcane.

    Anything starting with \0 is octal.

    The sequences \1 through \9 are always backrefs.

    Any other sequence is interpreted as a decimal, and if there
    are that many capture buffers defined in the pattern at that point
    then the sequence is a backreference. If however it is larger
    than the number of buffers the sequence is treated as an octal digit.

    A consequence of this is that \118 could be a backreference to
    the 118th capture buffer, or it could be the string "\11" . "8". In
    other words depending on the context we might even use a different
    number of digits for the escape!

    This also left an awkward edge case, of multi digit sequences
    starting with 8 or 9 like m/\87/ which would result in us parsing
    as though we had seen /87/ (iow a null byte at the start) or worse
    like /\x{00}87/ which is clearly wrong.

    This patches fixes the cases where the capture buffers are defined,
    and causes things like the \87 or \97 to throw the same error that
    /\8/ would. One might argue we should complain about an illegal
    octal sequence, but this seems more consistent with an error like
    /\9/ and IMO will be less surprising in an error message.

    This patch includes exhaustive tests of patterns of the form
    /(a)\1/, /((a))\2/ etc, so that we dont break this again if we
    change the logic more.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=wixX3ZHmwA&a=cc_unsubscribe
--
Fedora Extras Perl SIG
http://www.fedoraproject.org/wiki/Extras/SIGs/Perl
perl-devel mailing list
perl-devel@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/perl-devel