Re: Fix UTF Encoding issue

Benjamin Close <Benjamin.Close@xxxxxxxxxxxxxx> · Tue, 04 Dec 2007 08:16:24 +1030

Jakub Narebski wrote:
On Mon, 3 Dec 2007, Martin Koegler wrote:

On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:

Ismail Dönmez <ismail@xxxxxxxxxxxxx> writes:

Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:

Benjamin Close <Benjamin.Close@xxxxxxxxxxxxxx> writes:

-	eval { $res = decode_utf8($str, Encode::FB_CROAK); };
-	if (defined $res) {
-		return $res;
-	} else {
-		return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
-	}
+	eval { return ($res = decode_utf8($str, Encode::FB_CROAK)); };
+	return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
 }

This version is broken on Debian sarge and etch. Feeding a UTF-8 and a latin1
encoding of the same character sequence yields to different results.

For the record, this was on a debian sid machine.

#perl --version
This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi

and the result of not using the original patch was:

<h1>Software error:</h1>
<pre>Cannot decode string with wide characters at /usr/lib/perl/5.8/Encode.pm line 166.
</pre>

I haven't tried the other solutions tested here.
eval { $res = decode_utf8(...); }
if ($@) 
     return decode(...);
return $res

or

eval { $res = decode_utf8(...); }
if (defined $res)
      return $res;
else
    return decode(...);

show the same (wrong) behaviour on Debian sarge. They do not always
decode non UTF-8 characters correctly, eg.
#öäü does not work
#äöüä does work

On Debian etch, both versions are working.

I don't know enough Perl to decide if it is a bug in gitweb usage
of decode_utf8, if it is a bug in your version of Encode, or if it
is bug in Encode.

Send copy of this mail to maintainers of Encode perl module.

Ismail do you know if sid was also broken?
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html