Re: [BUG] Unicode filenames handling in `git log --stat`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I believe I have found exact place where strlen is used incorrectly
This is at diff.c:show_stats

https://github.com/git/git/blob/c50926e1f48891e2671e1830dbcd2912a4563450/diff.c#L2623

It probably should be replaced with one of utf8_width, utf8_strnwidth
or utf8_strwidth from utf8.c


On Wed, 10 Aug 2022 at 12:40, Torsten Bögershausen <tboegi@xxxxxx> wrote:
>
> On Tue, Aug 09, 2022 at 10:55:31PM -0700, Junio C Hamano wrote:
> > Calvin Wan <calvinwan@xxxxxxxxxx> writes:
> >
> > > Hi Alexander,
> > >
> > > Thank you for the report! I attempted to reproduce with the steps you
> > > provided, but was unable to do so. What commands would I have to run
> > > on a clean git repository to reproduce this?
> >
> > Sounds like a symptom observable when the width computed by
> > utf8.c::git_gcwidth(), using the width table imported from
> > unicode.org, and the width the terminal thinks each of the displayed
> > character has, do not match (e.g. seen when ambiguous characters are
> > involved, https://unicode.org/reports/tr11/#Ambiguous).
> >
>
> I am not fully sure about that - I can reproduce it with Latin based
> file names as well:
>
>  git log --stat
> [snip]
>  Arger.txt  | 1 +
>  Ärger.txt | 1 +
>    2 files changed, 2 insertions(+)
>
> From this very first experiment I would suspect that we use
> strlen() somewhere rather then utf8.c::git_gcwidth()
>
> More digging needed (but I don't promise anything today)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux