On Wed, 7 Dec 2011, Junio C Hamano wrote: > Jakub Narebski <jnareb@xxxxxxxxx> writes: > > > But doing this would change gitweb behavior. Currently when > > encountering something (usually line of output) that is not valid > > UTF-8, we decode it (to UTF-8) using $fallback_encoding, by default > > 'latin1'. Note however that this value is per gitweb installation, > > not per repository. > > I think we added and you acked 00f429a (gitweb: Handle non UTF-8 text > better, 2007-06-03) for a good reason, and I think the above argues that > whatever issue the commit tried to address is a non-issue. Is it really > true? I think that UTF-8 is much more prevalent character encoding in operating systems, programming languages, APIs, and software applications than it was 4 years ago. Also the solution implemented in said commit was a good start, but it remains incomplete: $fallback_encoding is per-installation which is too big granularity (there is `gui.encoding` per-repository config... but it is about main not fallback encoding; best would be to use gitattribute but currently there is no way to check attribute value at given revision). The proposed use open qw(:std :utf8); and removal of to_utf8 and $fallback_encoding would be regression compared to post-00f429a... but the tradeoff of more robust UTF-8 handling might be worth it. Note that to_utf8 handles git command output part by part, not as a whole; for UTF-8 vs latin1 (i.e. iso-8859-1) it does not matter though because latin1 is very unlikely to be recognized as valid utf-8[1], and ASCII characters pass-through for UTF-8. [1]: http://en.wikipedia.org/wiki/UTF-8#Advantages > > ... I guess > > it could be emulated by defining our own 'utf-8-with-fallback' > > encoding, or by defining our own PerlIO layer with PerlIO::via. > > But it no longer be simple solution (though still automatic). > > Between the current "everybody who read from the input must remember to > call to_utf8" and "everybody gets to read utf8 automatically for internal > processing", even though the latter may involve one-time pain to set up > the framework to do so, the pros-and-cons feels obvious to me. There is also a matter of performance (':utf8' and ':encoding(UTF-8)' are AFAIK implemented in C, both the Encode part and PerlIO part). -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html