Re: grep: fix multibyte regex handling under macOS (1819ad327b7a1f19540a819813b70a0e8a7f798f)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 1 Feb 2023 at 16:25, D. Ben Knoble <ben.knoble@xxxxxxxxx> wrote:
>
> I recently updated to git 2.39.1 and noticed today that `git diff
> --word-diff` fails for files with `diff=scheme`. I was able to narrow
> the failure down to the inclusion of control characters \xc0, \xff,
> \x80, \xbf by https://github.com/git/git/blob/2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473/userdiff.c#L17
> in the definition of the scheme diff pattern (really, all patterns).
>
> I suspect the commit referenced in the subject, given that it messes
> with regex handling on macOS.
>
> Relevant environment that I can think of:
> ```
> # locale
> LANG="fr_FR.UTF-8"
> LC_COLLATE="fr_FR.UTF-8"
> LC_CTYPE="fr_FR.UTF-8"
> LC_MESSAGES="fr_FR.UTF-8"
> LC_MONETARY="fr_FR.UTF-8"
> LC_NUMERIC="fr_FR.UTF-8"
> LC_TIME="fr_FR.UTF-8"
> LC_ALL="fr_FR.UTF-8"
> ```
>
> I'm on macOS 11.7.
>
> Failure (using Zsh to produce the characters; I think there's a Bash
> equivalent):
> ```
> # git diff --word-diff --word-diff-regex=$'[\xc0-\xff][\x80-\xbf]+'
> fatal¬†: invalid regular expression: [¿-ˇ][Ä-ø]+
> ```

FWIW that looks pretty weird to me, like the escapes in the charclass
were interpolated before being fed to the regex engine. Are you sure
you tested the right thing?

Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux