Re: Stupid quoting...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> On Wed, 13 Jun 2007, David Kastrup wrote:
>
>> what is the point in quoting file names and their characters in
>> git-diff's output?  And what is the recommended way of undoing the
>> damage?
>
> The recommended way is not using spaces to begin with. I mean, does 
> "David" contain spaces? People seem not to see the problem, and fail to 
> blame Microsoft for all the damage they have done, introducing that 
> stupid, stupid concept of filenames containing spaces, and _enforcing_ it.

Why are you talking about spaces ;-)?

There are a few things to note, but the first thing is that mere
spaces do not trigger quoting.  A tab (HT) does, so do non ASCII
characters.  The second thing is that we do this quoting for
various good reasons, and it is not likely to change.

As Alex mentions, the most safe way for programs to read is to
read from the -z format.  However, even if you are capable to do
so, it may be inconvenient in some languages (mainstream
languages like C and Perl are not among them).  Not quoting SP
is a conscious decision, as SP in filenames are rather common,
more common than non ASCII and much more common than HT.

The "raw" formats "ls-files -s", "ls-tree" and "diff --raw"
produce are designed to put names at the end, and typically
delimited with a HT, so that "lazy" scripts can use cut (whose
default delimiter is a HT) to pick out pieces from its output.
And plumbing tools reading from the standard input (most
notably, "update-index --stdin") know how to unquote them.  In
practice, not many people use non ASCII in pathnames and expect
them work sanely for everybody, so loosely written scripts, as
long as they cut at HT to pick out the pathname part, "mostly"
work (I think traditional core git scripts are safe, I suspect
some contributed ones shipped with git core may not be, Cogito
used to be very unsafe but it was audited and became much safer
before it got discontinued).

The pathname quoting rules in textual output was chosen
primarily to make diff output safer, as one of the most
important workflow git supports is e-mailable patches.

GNU patch treats HT on "+++ name"/"--- name" lines as the end of
name (and after HT comes timestamp), but the timestamp part is
treated as optional, which introduces ambiguities and confusion.
The issue was discussed some time ago (check the list archive
for discussion among I, Linus and Paul Eggert -- the GNU diff
and patch maintainer) and the quoting rules we use now is
consistent with what the diff and patch plan to use.  The update
on the GNU side may have already happened, it may not have.

When a patch appears in an e-mail, you would need to be aware
that not everybody has the luxury of living in UTF-8 only world.
Your commit message and cover letter may be in one encoding, the
pathnames that appear in diff headers may be in your filesystem
encoding, and the patch text that appear as the diff payload may
be in another document specific encoding.  All three could be
different (worse, a patch that touch more than one file can
carry different encodings in the payload part), and mixing
character set in a single piece of e-mail confuses people's MUA
and tends to mangle messages.  Quoting non ASCII characters in
pathnames, even they are perfectly valid and ordinary UTF-8
strings, is to eliminate one element in the above three as a
possible source of worries.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux