Re: [PATCH] userdiff: support regexec(3) with multi-byte support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 6, 2023 at 4:19 PM René Scharfe <l.s.r@xxxxxx> wrote:
>
> Since 1819ad327b (grep: fix multibyte regex handling under macOS,
> 2022-08-26) we use the system library for all regular expression
> matching on macOS, not just for git grep.  It supports multi-byte
> strings and rejects invalid multi-byte characters.
>
> This broke all built-in userdiff word regexes in UTF-8 locales because
> they all include such invalid bytes in expressions that are intended to
> match multi-byte characters without explicit support for that from the
> regex engine.
>
> "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word
> regexes to match a single non-space or multi-byte character.  The \xNN
> characters are invalid if interpreted as UTF-8 because they have their
> high bit set, which indicates they are part of a multi-byte character,
> but they are surrounded by single-byte characters.
>
> Replace that expression with "|[^[:space:]]" if the regex engine
> supports multi-byte matching, as there is no need to have an explicit
> range for multi-byte characters then.  Check for that capability at
> runtime, because it depends on the locale and thus on environment
> variables.  Construct the full replacement expression at build time
> and just switch it in if necessary to avoid string manipulation and
> allocations at runtime.
>
> Additionally the word regex for tex contains the expression
> "[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range.  The best
> replacement with only valid characters that I can come up with is
> "([a-zA-Z0-9]|[^\x01-\x7f])+".  Unlike the original it matches NUL
> characters, though.  Assuming that tex files usually don't contain NUL
> this should be acceptable.
>
> Reported-by: D. Ben Knoble <ben.knoble@xxxxxxxxx>
> Reported-by: Eric Sunshine <sunshine@xxxxxxxxxxxxxx>
> Helped-by: Junio C Hamano <gitster@xxxxxxxxx>
> Signed-off-by: René Scharfe <l.s.r@xxxxxx>

I tested the patch locally on top of ae73b2c8f1 and it solved my
problem. Seems like there's still some further discussion, though.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux