Re: [PATCH] gitweb: handle non UTF-8 text

Petr Baudis <pasky@xxxxxxx> · Tue, 29 May 2007 01:21:39 +0200

On Mon, May 28, 2007 at 10:47:34PM CEST, Martin Koegler wrote:
> gitweb assumes, that everything is in UTF-8. If a text contains invalid
> UTF-8 character sequences, the text must be in a different encoding.
> 
> This patch interprets such a text as latin1.
> 
> Signed-off-by: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx>
> ---
> For correct UTF-8, the patch does not change anything.
> 
> If commit/blob/... is not in UTF-8, it displays the text
> with a very high probability correct. 
> 
> As git itself is not aware of any encoding, I know no better
> possibility to handle non UTF-8 text in gitweb.

I don't think this is a reasonable approach; I actually dispute the high
probability - in western Europe it's obvious to assume latin1, but does
majority of users using non-ascii characters come from there? Or rather
from central Europe (like me, Petr Baudiš? ;-))? Somewhere else?

If we do something like this, we should do it properly and look at
configured i18n.commitEncoding for the project. (But as config lookup
may be expensive, probably do it only when we need it.)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html