Re: [PATCH v2 3/3] sequencer: reencode to utf-8 before arrange rebase's todo list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 06, 2019 at 10:30:34AM +0900, Junio C Hamano wrote:

> > That's normally what we do. The only cases we're covering here are when
> > somebody has explicitly asked that the commit object be stored in
> > another encoding. Presumably they'd also be using a matching
> > i18n.logOutputEncoding in that case, in which case logmsg_reencode()
> > would be a noop. I think the only reasons to do that are:
> >
> >   1. You're stuck on some legacy encoding for your terminal. But in that
> >      case, I think you'd still be better off storing utf-8 and
> >      translating on the fly, since whatever encoding you do store is
> >      baked into your objects for all time (so accept some slowness now,
> >      but eventually move to utf-8).
> >
> >   2. Your preferred language is bigger in utf-8 than in some specific
> >      encoding, and you'd rather save some bytes. I'm not sure how big a
> >      deal this is, given that commit messages don't tend to be that big
> >      in the first place (compared to trees and blobs). And the zlib
> >      deflation on the result might help remove some of the redundancy,
> >      too.
> 
> Perhaps add
> 
>     3. You are dealing with a project originated on and migrated
>        from a foreign SCM, and older parts of the history is stored
>        in a non-utf-8, even though recent history is in utf-8
> 
> to the mix?

I would think you'd want to convert to utf-8 as you do the migration in
that case, since you're writing new hashes anyway. But I think a similar
case would just be an old Git repository, where for some reason you
thought i18n.commitEncoding was a good idea back then (perhaps because
you were in situation (1) then, but now you aren't).

In either case, though, I don't think it's a compelling motivation for
optimization, if only because those old commits will be shown less and
less (and even without modern optimizations like commit-graph, we'd
generally avoid reencoding those old commits unless we're actually going
to _show_ them).

> I suspect even the heavy Windows/Mac users in Japan have migrated
> out of legacy (the suspicion comes from an anecdote that is offtopic
> here).

Thanks for the data point. All of this is very far from my personal
experience, so I mostly go on scraps of hearsay I pick up reading this
or that. :)

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux