On Tue, 2007-01-09 at 10:46 -0800, Junio C Hamano wrote: > I do not think you would want to make '-n' in the third point > sound so negative No, I really _do_ want that. > and make people on projects that chose to use > legacy encoding for whatever reasons feel _dirty_. ... but not that, because it wasn't aimed at them. > If the natural language in project's log is limited and a legacy > encoding is sufficient, and if all the participants agree on a > legacy encoding to use... (...for git's own purely internal storage format). That's not the use case for the -n option. Their case is what the i18n.commitencoding configuration option exists for. Although having said that, I don't actually know _why_ we let them override the default, since it's _internal_ to git. As long as git itself is correctly doing the conversion on the way in and out, there's no reason for them to care whether we use UTF-8, UCS-4, EBCDIC or some other arbitrary encoding (as long as our encoding can represent anything they choose to throw at us). > because tools other than git they need to > use are more convenient with the legacy encoding rather than > UTF-8, That makes about as much sense to me as letting them configure git to store objects uncompressed "because tools other than git are more convenient without compression". If our choice of _internal_ storage affects their other tools, then either they're doing something very strange like poking at git objects directly, or there's a bug in the git tools. > there is no need to give a lecture to them saying they > should switch to UTF-8 and/or what they have been doing is > sub-par -- it isn't. If people, for whatever reason, want git to use a given legacy character set for its storage format, they just have to set i18n.commitencoding. Those people aren't being lecture. (Although perhaps they _should_ be; either they're poking at things which shouldn't concern them, or they should be _reporting_ bugs instead of just working round them.) The only people who would want the -n option would be those who _want_ to intentionally throw away the character set encoding, and have one commit¹ in EBCDIC, a second in UTF-8 and a third in BIG5 with no way of telling which is which; each of them _labelled_ with the default encoding for the repository, which is probably UTF-8. -- dwmw2 ¹ Actually it's worse than that -- with RFC2047 you can have multiple encodings within the same _line_ of text. Evolution at least will do that; it uses ISO8859-1 for any character it can, and falls back to UTF-8 for other characters. Even within the same header. Importing with '-n' would just throw away the charset information and use the raw bytes. Even just importing the RFC2047-encoded text as-is would be better than that. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html