On Thu, Apr 6, 2023 at 4:19 PM René Scharfe <l.s.r@xxxxxx> wrote: > Since 1819ad327b (grep: fix multibyte regex handling under macOS, > 2022-08-26) we use the system library for all regular expression > matching on macOS, not just for git grep. It supports multi-byte > strings and rejects invalid multi-byte characters. > > This broke all built-in userdiff word regexes in UTF-8 locales because > they all include such invalid bytes in expressions that are intended to > match multi-byte characters without explicit support for that from the > regex engine. > > "|[^[:space:]]|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word > regexes to match a single non-space or multi-byte character. The \xNN > characters are invalid if interpreted as UTF-8 because they have their > high bit set, which indicates they are part of a multi-byte character, > but they are surrounded by single-byte characters. > > Replace that expression with "|[^[:space:]]" if the regex engine > supports multi-byte matching, as there is no need to have an explicit > range for multi-byte characters then. Check for that capability at > runtime, because it depends on the locale and thus on environment > variables. Construct the full replacement expression at build time > and just switch it in if necessary to avoid string manipulation and > allocations at runtime. > > Reported-by: D. Ben Knoble <ben.knoble@xxxxxxxxx> > Reported-by: Eric Sunshine <sunshine@xxxxxxxxxxxxxx> > Helped-by: Junio C Hamano <gitster@xxxxxxxxx> > Signed-off-by: René Scharfe <l.s.r@xxxxxx> Thank you, René! This patch resolves the problem I was experiencing[1]. I'm happy to have --color-words working again. [1]: https://lore.kernel.org/git/CAPig+cSNmws2b7f7aRA2C56kvQYG3w_g+KhYdqhtmf+XhtAMhQ@xxxxxxxxxxxxxx/