On Fri, 4 June 2010, Pavan Kumar Sunkara wrote: > Create a Gitweb::Escape module in 'gitweb/lib/Gitweb/Escape.pm' > to store all the quoting/unquoting and escaping subroutines > regarding the gitweb.perl script. <del> I would ask if it wouldn't be simpler to just have Gitweb::Util for all auxiliary / helper subroutines... ...if not for the fact that to avoid circular dependencies you would have, most probably, split this module further into * Gitweb::Escape::CGI * Gitweb::Escape::HTML and perhaps move 'unquote' to Gitweb::Git or Gitweb::Escape::Git. Then Gitweb::Escape could just import both. </del> Errr... sorry. I guess that either Gitweb::Config nor Gitweb::Request needs to include Gitweb::Escape. Note that 'Escape' part of name is not entirely correct, because some of subroutines mentioned below deal with _quoting_ / _unquoting_, not with escaping / unescaping. > > Subroutines moved: > to_utf8 > esc_param > esc_url > esc_html > esc_path > quot_cec > quot_upr > unquote > untabify > > Subroutines yet to move: (Contains not yet packaged subs & vars) > None > > Update gitweb/Makefile to install gitweb modules alongside gitwe > > Signed-off-by: Pavan Kumar Sunkara <pavan.sss1991@xxxxxxxxx> > --- > gitweb/Makefile | 1 + > gitweb/gitweb.perl | 157 +-------------------------------------- > gitweb/lib/Gitweb/Escape.pm | 175 +++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 177 insertions(+), 156 deletions(-) > create mode 100644 gitweb/lib/Gitweb/Escape.pm It's a pity that git cannot tell us here that it is straight code movement (moving code block from one file to the other)... > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index b30ac48..9e54983 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -12,7 +12,6 @@ use warnings; > use CGI qw(:standard :escapeHTML -nosticky); > use CGI::Util qw(unescape); > use CGI::Carp qw(fatalsToBrowser set_message); > -use Encode; > use Fcntl ':mode'; > use File::Find qw(); > use File::Basename qw(basename); > @@ -27,6 +26,7 @@ use lib __DIR__ . "/lib"; > > use Gitweb::Config; > use Gitweb::Request; > +use Gitweb::Escape; > > BEGIN { > CGI->compile() if $ENV{'MOD_PERL'}; > @@ -762,161 +762,6 @@ sub validate_project { > } > } > > -# decode sequences of octets in utf8 into Perl's internal form, > -# which is utf-8 with utf8 flag set if needed. gitweb writes out > -# in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning > -sub to_utf8 { [cut] > diff --git a/gitweb/lib/Gitweb/Escape.pm b/gitweb/lib/Gitweb/Escape.pm > new file mode 100644 > index 0000000..f6ea209 > --- /dev/null > +++ b/gitweb/lib/Gitweb/Escape.pm > @@ -0,0 +1,175 @@ > +#!/usr/bin/perl > +# > +# Gitweb::Escape -- gitweb's quoting/unquoting, escaping package > +# > +# This program is licensed under the GPLv2 > + > +package Gitweb::Escape; > + > +use strict; > +use warnings; > +use Exporter qw(import); > + > +our @ISA = qw(Exporter); This line is not necessary if you are using 'use Exporter qw(import);' This code: use Exporter; our @ISA = qw(Exporter); is equivalent to (safer) use Exporter; use base qw(Exporter); The section "Exporting without inheriting from Exporter" of Exporter(3pm) manpage says that using use Exporter qw(import); allows to avoid inheriting our package from Exporter. > +our @EXPORT = qw(&to_utf8 &esc_param &esc_url &esc_html &esc_path "_cec "_upr &unquote &untabify); Please try to not introduce such long likes. Perhaps: +our @EXPORT = qw(&to_utf8 &esc_param &esc_url &esc_html &esc_path "_cec "_upr &unquote &untabify); Also ampersands are not needed there, and I think hat they are not used usually, so it would be: +our @EXPORT = qw(to_utf8 esc_param esc_url esc_html esc_path quot_cec quot_upr unquote untabify); But note also comment below about splitting this file further. > + > +use Encode; Perhaps empty line between 'use Exporter;' (using outside Perl modules) and 'use Gitweb::Config;' would help readibility of code, hmmm...? Also, don't you need 'use CGI;' or 'use CGI qw(:escapeHTML);' here, to have access to CGI::escape (to be more exact CGI::Util::escape, but imported by CGI to CGI::escape) and CGI::escapeHTML? > +use Gitweb::Config; I personally would say here explicitely +use Gitweb::Config qw($fallback_encoding); to make it clear that we need only this single one config variable. > +use Gitweb::Request; As it is shown below, this is not necessary. > + > +# decode sequences of octets in utf8 into Perl's internal form, > +# which is utf-8 with utf8 flag set if needed. gitweb writes out > +# in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning s/at beginning/at beginning of XXX/; > +sub to_utf8 { > + my $str = shift; > + return undef unless defined $str; > + if (utf8::valid($str)) { > + utf8::decode($str); > + return $str; > + } else { > + return decode($fallback_encoding, $str, Encode::FB_DEFAULT); > + } > +} This single one uses $fallback_encoding from Gitweb::Config. > + > +# quote unsafe chars, but keep the slash, even when it's not > +# correct, but quoted slashes look too horrible in bookmarks > +sub esc_param { > + my $str = shift; > + return undef unless defined $str; > + $str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg; > + $str =~ s/ /\+/g; > + return $str; > +} > + > +# quote unsafe chars in whole URL, so some charactrs cannot be quoted > +sub esc_url { > + my $str = shift; > + return undef unless defined $str; > + $str =~ s/([^A-Za-z0-9\-_.~();\/;?:@&=])/sprintf("%%%02X", ord($1))/eg; > + $str =~ s/\+/%2B/g; > + $str =~ s/ /\+/g; > + return $str; > +} Hmmm... why this difference? esc_url() should differ from esc_param() only in that esc_url() escapes less (it should not escape query parameter delimites). It seems like 452e225 (gitweb: fix esc_param, 2009-10-13) by Giuseppe Bilotta missed fixing esc_url(), isn't it? Might be worth fixing upfront (in a separate commit, of course). Could you do this, please? > + > +# replace invalid utf8 character with SUBSTITUTION sequence > +sub esc_html { > + my $str = shift; > + my %opts = @_; > + > + return undef unless defined $str; > + > + $str = to_utf8($str); > + $str = $cgi->escapeHTML($str); We can use 'CGI::escapeHTML;' here, and avoid dependency on Gitweb::Request. This change probably should be done in a separate cleanup patch, preceding this one. > + if ($opts{'-nbsp'}) { > + $str =~ s/ / /g; > + } > + $str =~ s|([[:cntrl:]])|(($1 ne "\t") ? quot_cec($1) : $1)|eg; > + return $str; > +} > + > +# quote control characters and escape filename to HTML > +sub esc_path { > + my $str = shift; > + my %opts = @_; > + > + return undef unless defined $str; > + > + $str = to_utf8($str); > + $str = $cgi->escapeHTML($str); Same here. > + if ($opts{'-nbsp'}) { > + $str =~ s/ / /g; > + } > + $str =~ s|([[:cntrl:]])|quot_cec($1)|eg; > + return $str; > +} Those two subroutines use to_utf8(), which in turn uses $fallback_encoding. > + > +# Make control characters "printable", using character escape codes (CEC) > +sub quot_cec { [...] > +} > + > +# Alternatively use unicode control pictures codepoints, > +# Unicode "printable representation" (PR) > +sub quot_upr { [...] > +} > +# git may return quoted and escaped filenames > +sub unquote { [...] > +} Note that this subroutine is about neither CGI nor HTML, but about unquoting and unescaping output of git commands. Just FYI. > + > +# escape tabs (convert tabs to spaces) > +sub untabify { > + my $line = shift; > + > + while ((my $pos = index($line, "\t")) != -1) { > + if (my $count = (8 - ($pos % 8))) { > + my $spaces = ' ' x $count; > + $line =~ s/\t/$spaces/; > + } > + } > + > + return $line; > +} > + > +1; Good work! -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html