Den Saturday 29 March 2008 09.53.04 skrev Jeff King: > On Sat, Mar 29, 2008 at 09:44:55AM +0100, Robin Rosenberg wrote: > > > OK. Do you have an example function that guesses with high probability > > > whether a string is utf-8? If there are non-ascii characters but we > > > _don't_ guess utf-8, what should we do? > > > > I guess the best bet is to assume the locale. Btw, is the encoding header > > from the commit (when present) completely lost? (not that it can be > > trusted anyway). > > What do you mean by "assume the locale"? Is there a portable way to say > "this is the encoding of the locale the user has chosen?" On my system I > set LANG=en_US, and behind-the-scenes magic chooses utf-8 versus > iso8859-1. The environment variables are only part of the story. There is a langinfo API for this. See I18N::Langinfo(3pm) that knows about those and something else. # perl -e 'require I18N::Langinfo; I18N::Langinfo->import(qw(langinfo CODESET)); $codeset = langinfo(CODESET()); print "My codeset=". $codeset."\n";' My codeset=ISO-8859-15 -- robin -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html