Re: [PATCH] Update l10n guide

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jiang Xin <worldhello.net@xxxxxxxxx> writes:
> 2012/3/2 Johannes Sixt <j.sixt@xxxxxxxxxxxxx>:

> > It does not convert, but it records which encoding the text has. If you
> > don't specify anything, UTF-8 is assumed, and if your text is actually not
> > UTF-8, the result is necessarily garbage.
> >
> > Then you haven't set i18n.commitEncoding. Try this:
> >
> >   git config i18n.commitEncoding CP936
> 
> I know there are two config variables. i18n.commitEncoding will insert
> a "encoding XX" line to the commit object, while i18n.logOutputEncoding
> will set the default output encoding.

Note that according to documentation 'git commit' issues a warning if
the commit log message given to it does not look like a valid UTF-8
string, unless you explicitly say your project uses a legacy encoding.
Modern git would also warn if you have NUL ("\0") character in your
commit message, e.g. when using UCS-2 / UTF-16 encoding.

> But this implementation seems like a workaround.
> 
> * Tree objects do not have such implementation, so multibyte characters
>   can not be used as filenames.

And there is no place on pathnameEncoding in 'tree' object,
unfortunately.

One proposed solution was to convert filenames from filesystem
encoding to normal-form composed UTF-8 when creating tree objects, but
this would have to be optional.

Anyway at least for source code using characters outside US-ASCII is
really discouraged anyway.  Note also that sample 'pre-commit' hook
prevents adding files with non-ascii filenames.
 
> * Commit object without "encoding" instruction will be used as it is. So
>   people under the same non-utf8 locale may not notice that they
>   have not set the proper i18n.commitEncoding, until one day they
>   need accross platform development.

But see above.
 
> * As raw commit log alway used as fallback, sometimes it is hard to find
>   commit objects with wrong encoding instructions.
> 
> I think save commit object, tree object, packed-refs in UTF-8 is
> a better implementation.

Backward compatibiltiyt and performance.

-- 
Jakub Narebski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]