On Wed, 1 Feb 2023 at 17:22, D. Ben Knoble <ben.knoble@xxxxxxxxx> wrote: > > On Wed, Feb 1, 2023 at 11:09 AM demerphq <demerphq@xxxxxxxxx> wrote: > > FWIW that looks pretty weird to me, like the escapes in the charclass > > were interpolated before being fed to the regex engine. Are you sure > > you tested the right thing? > > Quite sure. `git diff --word-diff` fails. This was just a smaller > example based on the linked C code. > > Here's the output of `git diff --word-diff` (verbatim and dumped): > > ``` > fatal¬†: invalid regular expression: \|([^\\]*)\||([^][)(}{[ > ])+|[^[:space:]]|[¿-ˇ][Ä-ø]+ > 00000000: 6661 7461 6cc2 a03a 2069 6e76 616c 6964 fatal..: invalid > 00000010: 2072 6567 756c 6172 2065 7870 7265 7373 regular express > 00000020: 696f 6e3a 205c 7c28 5b5e 5c5c 5d2a 295c ion: \|([^\\]*)\ > 00000030: 7c7c 285b 5e5d 5b29 287d 7b5b 2009 5d29 ||([^][)(}{[ .]) > 00000040: 2b7c 5b5e 5b3a 7370 6163 653a 5d5d 7c5b +|[^[:space:]]|[ > 00000050: c02d ff5d 5b80 2dbf 5d2b 0a .-.][.-.]+. > ``` Interesting. The regex engine seems to be interpolating the \xC0 in such a way you arent seeing the real pattern. In the Perl regex engine I'd call that a bug (it used to do the same thing before we fixed it years ago[1]). FWIW, this is a valid regex in Perl so i dont think the pattern is at fault, its something else. I saw some discussion recently that the mac regex engine doesn't play nicely in certain ways, but i dont recollect the details. Sorry i can't help more. Try searching for EXTENDED and regex and mac. Maybe you can find the mail I mean. cheers, yves [1] I am one of the maintainers of the perl regex engine. -- perl -Mre=debug -e "/just|another|perl|hacker/"