Re: [PATCH] gitweb: Fix chop_str not to cut in middle of utf8 multibyte chars.

Jakub Narebski <jnareb@xxxxxxxxx> · Sat, 24 May 2008 15:34:23 +0200

On Wed, 21 May 2008, Anders Waldenborg wrote:
> Junio C Hamano wrote:

>> I haven't followed the codepath but what do the callers do to the string
>> returned from chop_str?  Don't they assume the string hasn't been decoded
>> (because the old implementation of chop_str did not do this to_utf8), and
>> emit the result directly to the output because it also assumes the
>> undecoded format is what the outside world wants?  In other words, don't
>> they now need to do different things because returned string has gone
>> through the to_utf8() processing already?
> 
> The to_utf8() (defined in gitweb.perl, not part of perl it self) is kind 
> of sneaky, it checks if the string already is valid utf8. (guess it 
> should be called ensure_utf8())

Perhaps it should...

> chop_str needs to work on decoded string, otherwise character count goes 
> all wrong. But maybe it is better to add the to_utf8() to the callsites?

Or do "binmode $fd, :utf8".

But yes, I guess converting to Perl internal form on input would be
good idea.  Gitweb currently does it partially...

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html