Re: [RFH] An early draft of v1.5.0 release notes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Horst H. von Brand" <vonbrand@xxxxxxxxxxxx> writes:

> Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>
> [...]
>
>> Perhaps that is the idea, but that idea is not described in above
>> new feature announcement. "... to reencode the message to UTF-8 
>> when displaying, if needed." would cover it, but perhaps better
>> would be to cover this in more detail: "reencode message to UTF-8
>> if i18n.commitencoding is not set to something other than UTF-8",
>> or "reencode ... to i18n.commitencoding ... if needed".
>
> And what happens to the people who can't/won't display UTF-8? This is a
> both a project wide configuration (how does stuff get saved) + a user/local
> configuration (how to display stuff).

Presumably you would do something like:

	git log | tcs -f utf -t latin1 | less

The point being that the input to tcs will be uniformly UTF-8
even the committers used Latin-1 and UTF-8, either carelessly or
deliberately [*1*].

Maybe i18n.displayencoding set to latin1 is what you are after?
I think it might make sense...

In any case, as Jakub and others pointed out, the description
was not nice nor clear.  How about this as an update?

* I18n

 - We have always encouraged the commit message to be encoded in
   UTF-8, but the users are allowed to use legacy encoding as
   appropriate for their projects.  This will continue to be the
   case.  However, a non UTF-8 commit encoding _must_ be
   explicitly set with i18n.commitencoding in the repository
   where a commit is made; otherwise git-commit-tree will
   complain if the log message does not look like a valid UTF-8
   string.

[Side note: in v1.5.0 preview, it only warns about this
 situation; I have a feeling that it might be better to promote
 this to an error and refuse to commit until the user sets
 i18n.commitencoding to the name of the legacy encoding used for
 the project -- this will be a one-time inconvenience but will
 be much better in the long run.]

 - The value of i18n.commitencoding in the originating
   repository is recorded in the commit object on the "encoding"
   header, if it is not UTF-8.  git-log and friends notice this,
   and reencodes the message to the encoding specified with
   i18n.commitencoding when displaying, if they are different.


[Footnote]

*1* For encoding as simple as Latin I do not think it is an
issue, but we do not want to encode everything to UTF-8 at
commit time, because non-reversible conversion can lose
information.  I do not want to rule out a situation where a
particular commit log entry needs to be in an encoding different
from the project norm, which hopefully is UTF-8, because it
needs to describe something in a character that cannot be
reversibly converted to UTF-8 (maybe the project is about iconv
enhancement, the commit fixes something related to irreversible
conversion and the log message wants to give an example).


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]