Fix chop_str not to cut in middle of utf8 multibyte chars. Without this fix at least author name in short log may cut in middle of a multibyte char. When the result comes to esc_html to_utf8 is called again, which doesn't find valid utf8 and decodes using $fallback_encoding making it even worse. This also have the nice side effect that it actually tries to show the first 10 _characters_, not the number of characters that happened to fit into 10 bytes. Signed-off-by: Anders Waldenborg <anders@xxxxxxx> --- Description updated as per your suggesion, and hopefully without the embarrassing whitespace corruption. gitweb/gitweb.perl | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2facf2d..8308e22 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -866,6 +866,10 @@ sub chop_str { my $add_len = shift || 10; my $where = shift || 'right'; # 'left' | 'center' | 'right' + # Make sure perl knows it is utf8 encoded so we don't + # cut in the middle of a utf8 multibyte char. + $str = to_utf8($str); + # allow only $len chars, but don't cut a word if it would fit in $add_len # if it doesn't fit, cut it if it's still longer than the dots we would add # remove chopped character entities entirely -- 1.5.5.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html