Re: Stupid quoting...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 19, 2007 at 23:19:39 -0700, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:
> 
> >> I don't see our discourse leading anywhere: the points have been made.
> >
> > I would really, really, really like to see a solution. Alas, I cannot 
> > think of one, other than _forcing_ the developers to use ASCII-only 
> > filenames.
> >
> > Note that there is no convention yet in Git to state which encoding your 
> > filenames are supposed to use. And in fact, we already had a fine example 
> > in git.git why this is particularly difficult. MacOSX is too clever to be 
> > true, in that it gladly takes filenames in one encoding, but reads those 
> > filenames out in _another_ encoding. Thus, a "git add <filename>" can well 
> > end up in git-status saying that a file was deleted, and another file 
> > (actually the same, but in a different encoding) is untracked.

I saw bazaar folks discussing this MacOSX issue. Basically in MacOSX
filenames are *unicode* strings (just as they are in Windows, btw). Unicode,
for compatibility reasons allows expressing many characters in multiple forms
-- composed and decomposed. For example 'á' can be expressed as '\u00e1'
('\xc3\xa1' in utf-8) or as 'a\u0301' ('a\xcc\x81' in utf-8).

MaxOSX opts to, in accord with unicode standard, treat such representations
as equal and it does so by normalizing all filenames to one form. I don't
know whether it uses compatibility normalization and I believe it uses the
decomposed form (which makes the issue immediately obvious, because most
programs work in composed form).

> By the way, the pathname quoting done by "diff" does not even
> attempt to tackle that.  I already explained why in the thread
> so I would not repeat myself.
> 
> Having said that, the absolute minimum that needs to be quoted
> are double-quote (because it is used by quoting as agreed with
> GNU diff/patch maintainer), backslash (used to introduce C-like
> quoting), newline and horizontal tab (makes "patch" confused, as
> it would make it ambiguous where the pathname ends), so I am not
> opposed to a patch that introduces a new mode, probably on by
> default _unless_ we are generating --format=email, that does not
> quote high byte values.  That would solve "My UTF-8 filenames
> are unreadable on my terminal" problem.

IMHO it should be the default even for email format. Most projects that use
non-ascii filenames probably have all members using same locale. And for
such group, it will just work. Also usually the file names, content and
commit messages will usually be in the same (though project-specific)
encoding, so if charset in content-type is set to that, people with different
locale able to represent the same characters will still see the names
correctly. For other people, the MUA will probably print some escape anyway
(it will not screw up the terminal -- it usually knows what it can safely
pass to it).

-- 
						 Jan 'Bulb' Hudec <bulb@xxxxxx>

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux