Re: [PATCH v4 1/2] diff.c: When appropriate, use utf8_strwidth(), part1

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 07 Sep 2022 11:31:45 -0700

Torsten Bögershausen <tboegi@xxxxxx> writes:

>
> OK - the comment can be removed.
>
> I didn't know how to read this comment:
>>...but the former may chomp a single multi-byte letter in the middle,
>> which would need to be corrected as a part of this change.
>
> After diffing into the code some more times, I think that we don't
> chomp a single byte out of an UTF-8 sequence.

When turning a/b/c vs a/B/c into a/{b->B}/c, two steps are involved.
Take common prefix and suffix (in this case 'a' and 'c') and turn
'b' vs 'B' into {b->B} is one step.  The other is what to do when
prefix and suffix are long.  After turning aaaaa/b/c vs aaaaa/B/c
into aaaaa/{b->B}/c, if the result is overly long, how we shorten
the prefix (i.e. aaaaa) and the suffix?

I knew the code that produces {b->B} honored '/' boundary, but I
just did not remember offhand what diff.c::pprint_rename() did in
its latter half, specifically, if it just chomped pfx and sfx as a
sequence of bytes (which would have been wrong) or insisted that the
common sequence search honors '/' boundary (which would be OK, as
byte '/' will not appear in the middle of a single multi-byte UTF-8
"letter").  I think iti s doing the latter, so it should be fine.

Thanks.