Re: [PATCH] Re: git-mailinfo '-u' argument should be default.

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Wed, 10 Jan 2007 07:49:04 +0800

On Tue, 2007-01-09 at 10:46 -0800, Junio C Hamano wrote:
> I do not think you would want to make '-n' in the third point
> sound so negative 

No, I really _do_ want that.

> and make people on projects that chose to use
> legacy encoding for whatever reasons feel _dirty_. 

... but not that, because it wasn't aimed at them.

>  If the natural language in project's log is limited and a legacy
> encoding is sufficient, and if all the participants agree on a
> legacy encoding to use...
(...for git's own purely internal storage format).

That's not the use case for the -n option. Their case is what the
i18n.commitencoding configuration option exists for.

Although having said that, I don't actually know _why_ we let them
override the default, since it's _internal_ to git. As long as git
itself is correctly doing the conversion on the way in and out, there's
no reason for them to care whether we use UTF-8, UCS-4, EBCDIC or some
other arbitrary encoding (as long as our encoding can represent anything
they choose to throw at us).

>  because tools other than git they need to
> use are more convenient with the legacy encoding rather than
> UTF-8,

That makes about as much sense to me as letting them configure git to
store objects uncompressed "because tools other than git are more
convenient without compression". 

If our choice of _internal_ storage affects their other tools, then
either they're doing something very strange like poking at git objects
directly, or there's a bug in the git tools.

>  there is no need to give a lecture to them saying they
> should switch to UTF-8 and/or what they have been doing is
> sub-par -- it isn't. 

If people, for whatever reason, want git to use a given legacy character
set for its storage format, they just have to set i18n.commitencoding.
Those people aren't being lecture. (Although perhaps they _should_ be;
either they're poking at things which shouldn't concern them, or they
should be _reporting_ bugs instead of just working round them.)

The only people who would want the -n option would be those who _want_
to intentionally throw away the character set encoding, and have one
commit¹ in EBCDIC, a second in UTF-8 and a third in BIG5 with no way of
telling which is which; each of them _labelled_ with the default
encoding for the repository, which is probably UTF-8.

-- 
dwmw2

¹ Actually it's worse than that -- with RFC2047 you can have multiple 
encodings within the same _line_ of text. Evolution at least will do that;
it uses ISO8859-1 for any character it can, and falls back to UTF-8 for
other characters. Even within the same header. Importing with '-n' would
just throw away the charset information and use the raw bytes. Even just 
importing the RFC2047-encoded text as-is would be better than that.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html