Re: [PATCH 2/2] utf8: accept "latin-1" as ISO-8859-1

Jeff King <peff@xxxxxxxx> · Tue, 27 Sep 2016 01:57:45 -0400

On Mon, Sep 26, 2016 at 06:22:11PM -0700, Junio C Hamano wrote:

> Even though latin-1 is still seen in e-mail headers, some platforms
> only install ISO-8859-1.  "iconv -f ISO-8859-1" succeeds, while
> "iconv -f latin-1" fails on such a system.
> 
> Using the same fallback_encoding() mechanism factored out in the
> previous step, teach ourselves that "ISO-8859-1" has a better chance
> of being accepted than "latin-1".

I was curious if this was the most official or accepted spelling.
Grepping a few hundred thousand messages from my mail archives, it does
seem to be the most common.

> diff --git a/utf8.c b/utf8.c
> index 550e785..0c8e011 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -501,6 +501,13 @@ static const char *fallback_encoding(const char *name)
>  	if (is_encoding_utf8(name))
>  		return "UTF-8";
>  
> +	/*
> +	 * Even though latin-1 is still seen in e-mail
> +	 * headers, some platforms only install ISO-8859-1.
> +	 */
> +	if (!strcasecmp(name, "latin-1"))
> +		return "ISO-8859-1";
> +

For the UTF-8 fallbacks, we actually detect their equivalence via
same_encoding() before even hitting iconv. Is it worth doing the same
here?

I have to admit that I don't care too deeply about performance for
somebody who wants to convert "latin1" to "ISO-8859-1". If one of your
encodings is not UTF-8, you are probably Doing It Wrong. :)

-Peff