I believe I have found exact place where strlen is used incorrectly This is at diff.c:show_stats https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/diff.c#L2623 It probably should be replaced with one of utf8_width, utf8_strnwidth or utf8_strwidth from utf8.c On Wed, 10 Aug 2022 at 12:40, Torsten Bögershausen <tboegi@xxxxxx> wrote: > > On Tue, Aug 09, 2022 at 10:55:31PM -0700, Junio C Hamano wrote: > > Calvin Wan <calvinwan@xxxxxxxxxx> writes: > > > > > Hi Alexander, > > > > > > Thank you for the report! I attempted to reproduce with the steps you > > > provided, but was unable to do so. What commands would I have to run > > > on a clean git repository to reproduce this? > > > > Sounds like a symptom observable when the width computed by > > utf8.c::git_gcwidth(), using the width table imported from > > unicode.org, and the width the terminal thinks each of the displayed > > character has, do not match (e.g. seen when ambiguous characters are > > involved, https://unicode.org/reports/tr11/#Ambiguous). > > > > I am not fully sure about that - I can reproduce it with Latin based > file names as well: > > git log --stat > [snip] > Arger.txt | 1 + > Ärger.txt | 1 + > 2 files changed, 2 insertions(+) > > From this very first experiment I would suspect that we use > strlen() somewhere rather then utf8.c::git_gcwidth() > > More digging needed (but I don't promise anything today)