Jeff King <peff@xxxxxxxx> writes: > That's normally what we do. The only cases we're covering here are when > somebody has explicitly asked that the commit object be stored in > another encoding. Presumably they'd also be using a matching > i18n.logOutputEncoding in that case, in which case logmsg_reencode() > would be a noop. I think the only reasons to do that are: > > 1. You're stuck on some legacy encoding for your terminal. But in that > case, I think you'd still be better off storing utf-8 and > translating on the fly, since whatever encoding you do store is > baked into your objects for all time (so accept some slowness now, > but eventually move to utf-8). > > 2. Your preferred language is bigger in utf-8 than in some specific > encoding, and you'd rather save some bytes. I'm not sure how big a > deal this is, given that commit messages don't tend to be that big > in the first place (compared to trees and blobs). And the zlib > deflation on the result might help remove some of the redundancy, > too. Perhaps add 3. You are dealing with a project originated on and migrated from a foreign SCM, and older parts of the history is stored in a non-utf-8, even though recent history is in utf-8 to the mix? > The two-part user-format thing goes back to 7e77df39bf (pretty: two > phase conversion for non utf-8 commits, 2013-04-19). It does seem like > it would be cheaper to convert the format string into the output > encoding (it would need to be an ascii superset, but that's already the > case, since we expect to parse "author", etc out of the re-encoded > commit object). But again, I have trouble caring too much about the > performance of this case, as I consider it to be mostly legacy at this > point. But I also don't write in (say) Japanese, so maybe I'm being too > narrow-minded about whether people really want to avoid utf-8. I suspect even the heavy Windows/Mac users in Japan have migrated out of legacy (the suspicion comes from an anecdote that is offtopic here).