Torsten Bögershausen <tboegi@xxxxxx> writes: > > OK - the comment can be removed. > > I didn't know how to read this comment: >>...but the former may chomp a single multi-byte letter in the middle, >> which would need to be corrected as a part of this change. > > After diffing into the code some more times, I think that we don't > chomp a single byte out of an UTF-8 sequence. When turning a/b/c vs a/B/c into a/{b->B}/c, two steps are involved. Take common prefix and suffix (in this case 'a' and 'c') and turn 'b' vs 'B' into {b->B} is one step. The other is what to do when prefix and suffix are long. After turning aaaaa/b/c vs aaaaa/B/c into aaaaa/{b->B}/c, if the result is overly long, how we shorten the prefix (i.e. aaaaa) and the suffix? I knew the code that produces {b->B} honored '/' boundary, but I just did not remember offhand what diff.c::pprint_rename() did in its latter half, specifically, if it just chomped pfx and sfx as a sequence of bytes (which would have been wrong) or insisted that the common sequence search honors '/' boundary (which would be OK, as byte '/' will not appear in the middle of a single multi-byte UTF-8 "letter"). I think iti s doing the latter, so it should be fine. Thanks.