Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes: > On Wed, 13 Jun 2007, David Kastrup wrote: > >> what is the point in quoting file names and their characters in >> git-diff's output? And what is the recommended way of undoing the >> damage? > > The recommended way is not using spaces to begin with. I mean, does > "David" contain spaces? People seem not to see the problem, and fail to > blame Microsoft for all the damage they have done, introducing that > stupid, stupid concept of filenames containing spaces, and _enforcing_ it. Why are you talking about spaces ;-)? There are a few things to note, but the first thing is that mere spaces do not trigger quoting. A tab (HT) does, so do non ASCII characters. The second thing is that we do this quoting for various good reasons, and it is not likely to change. As Alex mentions, the most safe way for programs to read is to read from the -z format. However, even if you are capable to do so, it may be inconvenient in some languages (mainstream languages like C and Perl are not among them). Not quoting SP is a conscious decision, as SP in filenames are rather common, more common than non ASCII and much more common than HT. The "raw" formats "ls-files -s", "ls-tree" and "diff --raw" produce are designed to put names at the end, and typically delimited with a HT, so that "lazy" scripts can use cut (whose default delimiter is a HT) to pick out pieces from its output. And plumbing tools reading from the standard input (most notably, "update-index --stdin") know how to unquote them. In practice, not many people use non ASCII in pathnames and expect them work sanely for everybody, so loosely written scripts, as long as they cut at HT to pick out the pathname part, "mostly" work (I think traditional core git scripts are safe, I suspect some contributed ones shipped with git core may not be, Cogito used to be very unsafe but it was audited and became much safer before it got discontinued). The pathname quoting rules in textual output was chosen primarily to make diff output safer, as one of the most important workflow git supports is e-mailable patches. GNU patch treats HT on "+++ name"/"--- name" lines as the end of name (and after HT comes timestamp), but the timestamp part is treated as optional, which introduces ambiguities and confusion. The issue was discussed some time ago (check the list archive for discussion among I, Linus and Paul Eggert -- the GNU diff and patch maintainer) and the quoting rules we use now is consistent with what the diff and patch plan to use. The update on the GNU side may have already happened, it may not have. When a patch appears in an e-mail, you would need to be aware that not everybody has the luxury of living in UTF-8 only world. Your commit message and cover letter may be in one encoding, the pathnames that appear in diff headers may be in your filesystem encoding, and the patch text that appear as the diff payload may be in another document specific encoding. All three could be different (worse, a patch that touch more than one file can carry different encodings in the payload part), and mixing character set in a single piece of e-mail confuses people's MUA and tends to mangle messages. Quoting non ASCII characters in pathnames, even they are perfectly valid and ordinary UTF-8 strings, is to eliminate one element in the above three as a possible source of worries. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html