Re: [BUG REPORT] File names that contain UTF8 characters are unnecessarily escaped in 'git status .' messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 27, 2021 at 06:56:28AM +0200, Torsten Bögershausen wrote:

> On Wed, May 26, 2021 at 04:41:38PM -0700, Yuri wrote:
> > On 5/26/21 4:32 PM, Junio C Hamano wrote:
> > > "git config core.quotepath no"?
> >
> >
> > I didn't have the 'core.quotepath' value set. 'git config core.quotepath no'
> > changed the behavior to no quoting.
> >
> > So it looks like the default value of 'core.quotepath' is incorrect: it
> > should be based on terminal capabilities.
> >
> 
> This are 2 different things.
> If you are in a project where only ASCII names are allowed (for whatever reason),
> you may want `git config core.quotepath no`, regardless what the terminal can do.
> 
> (Beside that, are ther terminals that don't handle UTF-8 these days?)

I don't think core.quotepath is just about UTF-8. It is agnostic to the
encoding of the paths, so it is really a question of whether to just
pass through bytes with the high bit set.

So I think the more accurate question is: do the paths in your
repositories generally contain bytes that your terminal can interpret
sensibly?  I'd guess the answer is usually yes, even if you are using
latin1 or similar (or else "ls" would show you mojibake, too).

But there's a follow-on, too: do all the other things which consume
quoted path output likewise handle it? Setting core.quotepath will
impact all parts of Git, including plumbing. So a script that parses
diff-tree output, for example, will see a difference.

I'd guess that most text-processing tools these days are reasonably
happy with high-bit chars. But if we were to flip the default, we might
see regressions with:

  - very old / obscure systems (I'd guess even old versions of GNU tools
    are good, but who knows what Solaris sed will do)

  - some scripting languages (like perl and ruby) have internal strings
    that are encoding-aware, and so they are picky about reading
    high-bit input from a descriptor, especially if it isn't utf8.
    The fix is usually easy-ish, but may be a surprise for some folks
    (OTOH, I can imagine it fixes bugs in sloppily-written scripts which
    did not anticipate the incoming filenames being quoted ;) ).

As Git is used more and more internationally, I suspect the value of
defaulting core.quotepath=no increases. And as time goes on and people
tend to standardize on utf8-aware tools and environments, the risk of
doing so decreases. So while core.quotepath=yes was a conservative
choice in 2007, it might be time to look at switching.

> Any, if you prefer UTF-8 as a default,
> 
> git config --global core.quotepath yes
> 
> is your friend (like mine)

Just a nit/clarification for other readers, but I think you have yes/no
flipped here and earlier in your message.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux