Re: gitweb and unicode special characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2008/12/12 Jakub Narebski <jnareb@xxxxxxxxx>:> Jakub Narebski <jnareb@xxxxxxxxx> writes:>> "Praveen A" <pravi.a@xxxxxxxxx> writes:>>>> > Git currently does not handle unicode special characters ZWJ and ZWNJ,>> > both are heavily used in Malayalam and common in other languages>> > needing complex text layout like Sinhala and Arabic.>> >>> > An example of this is shown in the commit message here>> > http://git.savannah.gnu.org/gitweb/?p=smc.git;a=commit;h=c3f368c60aabdc380c77608c614d91b0a628590a>> >>> > \20014 and \20015 should have been ZWNJ and ZWJ respectively. You just>> > need to handle them as any other unicode character - especially it is>> > a commit message and expectation is normal pain text display.>> >>> > I hope some one will fix this.>>>> Well, I am bit stumped.  git_commit calls format_log_line_html, which>> in turn calls esc_html.  esc_html looks like this:>>>>   sub esc_html ($;%) {>>       my $str = shift;>>       my %opts = @_;>>>>   **  $str = to_utf8($str);>>       $str = $cgi->escapeHTML($str);>>       if ($opts{'-nbsp'}) {>>               $str =~ s/ /&nbsp;/g;>>       }>>   **  $str =~ s|([[:cntrl:]])|(($1 ne "\t") ? quot_cec($1) : $1)|eg;>>       return $str;>>   }>>>> The two important lines are marked with '**'.> [...]>>> So it looks like Perl treats \20014 and \20015 (ZWNJ and ZWJ) as>> belonging to '[:cntrl:]' class. I don't know if it is correct from the>> point of view of Unicode character classes, therefore if it is a bug>> in Perl, or just in gitweb.>> I checked this, via this simple Perl script:>>  #!/usr/bin/perl>>  use charnames ":full";>>  my $c = ord("\N{ZWNJ}");>  printf "oct=%o dec=%d hex=%x\n", $c, $c, $c;>>  "\N{ZWNJ}" =~ /[[:cntrl:]]/ and print "is [:cntrl:]";>> And the answer was:>>  oct=20014 dex=8204 hex=200c>  is [:cntrl:]>> 'ZERO WIDTH NON-JOINER' _is_ control character... We probably should> use [^[:print:][:space:]] instead of [[:cntrl:]] here.
That looks good. But I'm wondering why do we need to filter at all?Is it a security concern? It is just description.
>> [...]>> P.S. Even that might not help much, as Savannah uses git and gitwev>> version 1.5.6.5, which is probably version released with some major>> distribution.  As of now we are at 1.6.0.5...>> Which can be seen from the fact that gitweb uses octal escapes,> instead of hex escapes...
But we can expect it to work someday when savannah updates their gitversion, or we can bug them to upgrade if the fix is in official gitrelease.
- Praveenj4v4m4n>> --> Jakub Narebski> Poland> ShadeHawk on #git>


-- പ്രവീണ്‍ അരിമ്പ്രത്തൊടിയില്‍<GPLv2> I know my rights; I want my phone call!<DRM> What use is a phone call, if you are unable to speak?(as seen on /.)Join The DRM Elimination Crew Now!http://fci.wikia.com/wiki/Anti-DRM-Campaign��.n��������+%����;��w��{.n��������n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�m


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux