Re: [RFC/PATCH] gitweb: Create Gitweb::Escape module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 4 June 2010, Pavan Kumar Sunkara wrote:

> Create a Gitweb::Escape module in 'gitweb/lib/Gitweb/Escape.pm'
> to store all the quoting/unquoting and escaping subroutines
> regarding the gitweb.perl script.

<del>
I would ask if it wouldn't be simpler to just have Gitweb::Util
for all auxiliary / helper subroutines...

...if not for the fact that to avoid circular dependencies you would
have, most probably, split this module further into
 * Gitweb::Escape::CGI
 * Gitweb::Escape::HTML
and perhaps move 'unquote' to Gitweb::Git or Gitweb::Escape::Git.
Then Gitweb::Escape could just import both.
</del>

Errr... sorry.  I guess that either Gitweb::Config nor Gitweb::Request
needs to include Gitweb::Escape.

Note that 'Escape' part of name is not entirely correct, because some of
subroutines mentioned below deal with _quoting_ / _unquoting_, not with
escaping / unescaping.

> 
> Subroutines moved:
> 	to_utf8
> 	esc_param
> 	esc_url
> 	esc_html
> 	esc_path
> 	quot_cec
> 	quot_upr
> 	unquote
> 	untabify
> 
> Subroutines yet to move: (Contains not yet packaged subs & vars)
> 	None
> 
> Update gitweb/Makefile to install gitweb modules alongside gitwe
> 
> Signed-off-by: Pavan Kumar Sunkara <pavan.sss1991@xxxxxxxxx>
> ---
>  gitweb/Makefile             |    1 +
>  gitweb/gitweb.perl          |  157 +--------------------------------------
>  gitweb/lib/Gitweb/Escape.pm |  175 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 177 insertions(+), 156 deletions(-)
>  create mode 100644 gitweb/lib/Gitweb/Escape.pm

It's a pity that git cannot tell us here that it is straight code
movement (moving code block from one file to the other)...

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index b30ac48..9e54983 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -12,7 +12,6 @@ use warnings;
>  use CGI qw(:standard :escapeHTML -nosticky);
>  use CGI::Util qw(unescape);
>  use CGI::Carp qw(fatalsToBrowser set_message);
> -use Encode;
>  use Fcntl ':mode';
>  use File::Find qw();
>  use File::Basename qw(basename);
> @@ -27,6 +26,7 @@ use lib __DIR__ . "/lib";
>  
>  use Gitweb::Config;
>  use Gitweb::Request;
> +use Gitweb::Escape;
>  
>  BEGIN {
>  	CGI->compile() if $ENV{'MOD_PERL'};
> @@ -762,161 +762,6 @@ sub validate_project {
>  	}
>  }
>  
> -# decode sequences of octets in utf8 into Perl's internal form,
> -# which is utf-8 with utf8 flag set if needed.  gitweb writes out
> -# in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning
> -sub to_utf8 {

[cut]

> diff --git a/gitweb/lib/Gitweb/Escape.pm b/gitweb/lib/Gitweb/Escape.pm
> new file mode 100644
> index 0000000..f6ea209
> --- /dev/null
> +++ b/gitweb/lib/Gitweb/Escape.pm
> @@ -0,0 +1,175 @@
> +#!/usr/bin/perl
> +#
> +# Gitweb::Escape -- gitweb's quoting/unquoting, escaping package
> +#
> +# This program is licensed under the GPLv2
> +
> +package Gitweb::Escape;
> +
> +use strict;
> +use warnings;
> +use Exporter qw(import);
> +
> +our @ISA = qw(Exporter);

This line is not necessary if you are using 'use Exporter qw(import);'


This code:

  use Exporter;
  our @ISA = qw(Exporter);

is equivalent to (safer)

  use Exporter;
  use base qw(Exporter);

The section "Exporting without inheriting from Exporter" of
Exporter(3pm) manpage says that using

  use Exporter qw(import);

allows to avoid inheriting our package from Exporter.

> +our @EXPORT = qw(&to_utf8 &esc_param &esc_url &esc_html &esc_path &quot_cec &quot_upr &unquote &untabify);

Please try to not introduce such long likes.  Perhaps:

  +our @EXPORT = qw(&to_utf8 &esc_param &esc_url &esc_html &esc_path
                    &quot_cec &quot_upr &unquote &untabify);

Also ampersands are not needed there, and I think hat they are not used
usually, so it would be:

  +our @EXPORT = qw(to_utf8 esc_param esc_url esc_html esc_path
                    quot_cec quot_upr unquote untabify);

But note also comment below about splitting this file further.

> +
> +use Encode;

Perhaps empty line between 'use Exporter;' (using outside Perl modules)
and 'use Gitweb::Config;' would help readibility of code, hmmm...?

Also, don't you need 'use CGI;' or 'use CGI qw(:escapeHTML);' here, to
have access to CGI::escape (to be more exact CGI::Util::escape, but
imported by CGI to CGI::escape) and CGI::escapeHTML?

> +use Gitweb::Config;

I personally would say here explicitely

  +use Gitweb::Config qw($fallback_encoding);

to make it clear that we need only this single one config variable.

> +use Gitweb::Request;

As it is shown below, this is not necessary.

> +
> +# decode sequences of octets in utf8 into Perl's internal form,
> +# which is utf-8 with utf8 flag set if needed.  gitweb writes out
> +# in utf-8 thanks to "binmode STDOUT, ':utf8'" at beginning

s/at beginning/at beginning of XXX/;

> +sub to_utf8 {
> +	my $str = shift;
> +	return undef unless defined $str;
> +	if (utf8::valid($str)) {
> +		utf8::decode($str);
> +		return $str;
> +	} else {
> +		return decode($fallback_encoding, $str, Encode::FB_DEFAULT);
> +	}
> +}

This single one uses $fallback_encoding from Gitweb::Config.

> +
> +# quote unsafe chars, but keep the slash, even when it's not
> +# correct, but quoted slashes look too horrible in bookmarks
> +sub esc_param {
> +	my $str = shift;
> +	return undef unless defined $str;
> +	$str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg;
> +	$str =~ s/ /\+/g;
> +	return $str;
> +}
> +
> +# quote unsafe chars in whole URL, so some charactrs cannot be quoted
> +sub esc_url {
> +	my $str = shift;
> +	return undef unless defined $str;
> +	$str =~ s/([^A-Za-z0-9\-_.~();\/;?:@&=])/sprintf("%%%02X", ord($1))/eg;
> +	$str =~ s/\+/%2B/g;
> +	$str =~ s/ /\+/g;
> +	return $str;
> +}

Hmmm... why this difference?  esc_url() should differ from esc_param()
only in that esc_url() escapes less (it should not escape query
parameter delimites).  It seems like 452e225 (gitweb: fix esc_param,
2009-10-13) by Giuseppe Bilotta missed fixing esc_url(), isn't it?

Might be worth fixing upfront (in a separate commit, of course).  Could
you do this, please?

> +
> +# replace invalid utf8 character with SUBSTITUTION sequence
> +sub esc_html {
> +	my $str = shift;
> +	my %opts = @_;
> +
> +	return undef unless defined $str;
> +
> +	$str = to_utf8($str);
> +	$str = $cgi->escapeHTML($str);

We can use 'CGI::escapeHTML;' here, and avoid dependency on
Gitweb::Request.

This change probably should be done in a separate cleanup patch,
preceding this one.

> +	if ($opts{'-nbsp'}) {
> +		$str =~ s/ /&nbsp;/g;
> +	}
> +	$str =~ s|([[:cntrl:]])|(($1 ne "\t") ? quot_cec($1) : $1)|eg;
> +	return $str;
> +}
> +
> +# quote control characters and escape filename to HTML
> +sub esc_path {
> +	my $str = shift;
> +	my %opts = @_;
> +
> +	return undef unless defined $str;
> +
> +	$str = to_utf8($str);
> +	$str = $cgi->escapeHTML($str);

Same here.

> +	if ($opts{'-nbsp'}) {
> +		$str =~ s/ /&nbsp;/g;
> +	}
> +	$str =~ s|([[:cntrl:]])|quot_cec($1)|eg;
> +	return $str;
> +}

Those two subroutines use to_utf8(), which in turn uses
$fallback_encoding.

> +
> +# Make control characters "printable", using character escape codes (CEC)
> +sub quot_cec {
[...]
> +}
> +
> +# Alternatively use unicode control pictures codepoints,
> +# Unicode "printable representation" (PR)
> +sub quot_upr {
[...]
> +}

> +# git may return quoted and escaped filenames
> +sub unquote {
[...]
> +}

Note that this subroutine is about neither CGI nor HTML, but about
unquoting and unescaping output of git commands.  Just FYI.

> +
> +# escape tabs (convert tabs to spaces)
> +sub untabify {
> +	my $line = shift;
> +
> +	while ((my $pos = index($line, "\t")) != -1) {
> +		if (my $count = (8 - ($pos % 8))) {
> +			my $spaces = ' ' x $count;
> +			$line =~ s/\t/$spaces/;
> +		}
> +	}
> +
> +	return $line;
> +}
> +
> +1;

Good work!
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]