Re: [PATCH] userdiff: support regexec(3) with multi-byte support

Johannes Sixt <j6t@xxxxxxxx> · Fri, 7 Apr 2023 12:56:00 +0200

Am 07.04.23 um 09:49 schrieb René Scharfe:
> Am 07.04.23 um 00:35 schrieb Johannes Sixt:
>> This is not equivalent. The original treated a sequence of non-ASCII
>> characters as a word. The new version treats each individual non-space
>> character (both ASCII and non-ASCII) as a word.
> 
> I assume you mean "The original treated [a single non-space as well as]
> a sequence of non-ASCII characters [making up a single multi-byte
> character] as a word.".  That works as intended by 664d44ee7f (userdiff:
> simplify word-diff safeguard, 2011-01-11).

I misread the original RE. I thought it would lump multiple multi-byte
characters together into one word, but it does not; sorry for that. It
looks like your suggested replacement is behaviorally identical to the
original after all, except perhaps for this one:

> The new one doesn't match multi-byte whitespace anymore.

but I did not find a reference that confirms it. I don't think we need
to bend over backwards to keep this compatibility, though.

-- Hannes