Re: Fix UTF Encoding issue

Ismail Dönmez <ismail@xxxxxxxxxxxxx> · Tue, 4 Dec 2007 10:12:50 +0200

Tuesday 04 December 2007 10:04:07 Martin Koegler yazmıştı:
> On Tue, Dec 04, 2007 at 08:16:24AM +1030, Benjamin Close wrote:
> > Jakub Narebski wrote:
> > >On Mon, 3 Dec 2007, Martin Koegler wrote:
> > >>On Mon, Dec 03, 2007 at 04:06:48AM -0800, Jakub Narebski wrote:
> > >>>Ismail Dönmez <ismail@xxxxxxxxxxxxx> writes:
> > >>>>Monday 03 December 2007 Tarihinde 12:14:43 yazm??t?:
> > >>>>>Benjamin Close <Benjamin.Close@xxxxxxxxxxxxxx> writes:
> > >>>>>>-	eval { $res = decode_utf8($str, Encode::FB_CROAK); };
> > >>>>>>-	if (defined $res) {
> > >>>>>>-		return $res;
> > >>>>>>-	} else {
> > >>>>>>-		return decode($fallback_encoding, $str,
> > >>>>>>Encode::FB_DEFAULT);
> > >>>>>>-	}
> > >>>>>>+	eval { return ($res = decode_utf8($str, Encode::FB_CROAK));
> > >>>>>>};
> > >>>>>>+	return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> > >>>>>> }
> > >>
> > >>This version is broken on Debian sarge and etch. Feeding a UTF-8 and a
> > >>latin1
> > >>encoding of the same character sequence yields to different results.
> >
> > For the record, this was on a debian sid machine.
> >
> > #perl --version
> > This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
> >
> > and the result of not using the original patch was:
> >
> > <h1>Software error:</h1>
> > <pre>Cannot decode string with wide characters at
> > /usr/lib/perl/5.8/Encode.pm line 166.
> > </pre>
> >
> >
> > I haven't tried the other solutions tested here.
>
> Debian etch also has v5.8.8.
>
> My main question is, why is the error not catched?
>
> I'm not a perl programmer, but in your patch the first line is a
> NOP. The return in eval seems to only returns from the eval block, so
> any text is decoded as latin1 with the second statement.
>
> In the original version, decode($fallback_encoding, $str,
> Encode::FB_DEFAULT) can not emit an error, else it would in your
> version too.
>
> In your version, eval is able to surpress the error of
> decode_utf8($str, Encode::FB_CROAK);, but not in the original version.

I think just a better method is to use (not tested):

if( is_utf8($str) ) 
{
	return decode_utf8($str);
}
else {
	return decode($str);
}

Regards,
ismail

-- 
Never learn by your mistakes, if you do you may never dare to try again.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html