[PATCH] gitweb: Convert string to internal form before chopping in chop_str

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fix chop_str not to cut in middle of utf8 multibyte chars. Without
this fix at least author name in short log may cut in middle of a
multibyte char. When the result comes to esc_html to_utf8 is called
again, which doesn't find valid utf8 and decodes using
$fallback_encoding making it even worse.

This also have the nice side effect that it actually tries to show the
first 10 _characters_, not the number of characters that happened to fit
into 10 bytes.

Signed-off-by: Anders Waldenborg <anders@xxxxxxx>
---

Description updated as per your suggesion, and hopefully without the
embarrassing whitespace corruption.

gitweb/gitweb.perl |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2facf2d..8308e22 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -866,6 +866,10 @@ sub chop_str {
 	my $add_len = shift || 10;
 	my $where = shift || 'right'; # 'left' | 'center' | 'right'
 
+	# Make sure perl knows it is utf8 encoded so we don't
+	# cut in the middle of a utf8 multibyte char.
+	$str = to_utf8($str);
+
 	# allow only $len chars, but don't cut a word if it would fit in $add_len
 	# if it doesn't fit, cut it if it's still longer than the dots we would add
 	# remove chopped character entities entirely
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux