On Tue, Oct 7, 2008 at 12:57 PM, Jakub Narebski <jnareb@xxxxxxxxx> wrote: > On Fri, 3 Oct 2008, Giuseppe Bilotta wrote: >> +# we make $path_info global because it's also used later on >> +my $path_info = $ENV{"PATH_INFO"}; >> +if ($path_info) { >> $my_url =~ s,\Q$path_info\E$,,; >> $my_uri =~ s,\Q$path_info\E$,,; >> } > > I think we could have left it as it was, i.e. have $path_info be > a local variable. Unless there is something I don't see. Well, it's being reassigned locally in evaluate_path_info ... since we're looking at it anyway, I thought it made sense to do it once and for all. The var could also potentially have other uses, such as enabling the use of path_info whenever path_info was used (something which is currently not done), so ... >> +# input parameters are stored with the long parameter name as key, so we need >> +# the name -> CGI key mapping here. This will also be used in the href >> +# subroutine to convert parameters to their CGI equivalent. > > Actually href() subroutine would be called many times, while parsing of > input params happens only once; that is why it is longname => shortname > hash, and not the reverse (shortname => longname). Also it does not > result in loss of performance as we parse all input params anyway. > > This explanation might be obvious, or could be put in comment > (increasing comment to code count ;-)) It's actually a good point, I'll put it in (and not for the comment to code ratio 8-D) >> +my @cgi_param_mapping = ( >> +my %cgi_param_mapping = @cgi_param_mapping; >> +my %actions = ( > > Perhaps %allowed_options should also be there, to have everything about > input params in single place... or perhaps not... I had thought about it. It makes sense in a way to put it after cgi_param_mapping. I didn't do it to reduce the code move. >> +# fill %input_params with the CGI parameters. All values except for 'opt' >> +# should be single values, but opt can be an array. We should probably >> +# build an array of parameters that can be multi-valued, but since for the time >> +# being it's only this one, we just single it out >> +while (my ($name, $symbol) = each %cgi_param_mapping) { >> + if ($symbol eq 'opt') { >> + $input_params{$name} = [$cgi->param($symbol)]; > > I would personally use "[ $cgi->param($symbol) ]" instead for better > (IMVHO) visibility, but this is just a matter of taste. Ah yes, much more readable. I'll do it. >> + } else { >> + $input_params{$name} = $cgi->param($symbol); >> + } >> +} > > Nice. I guess that you have checked that you caught all > $cgi->param(...) calls, and there aren't any beside those above... I did a grep so I *think* I didn't miss any. >> +# now read PATH_INFO and update the parameter list for missing parameters >> +sub evaluate_path_info { >> + return if defined $input_params{'project'}; > > I was 'my $path_info = $ENV{"PATH_INFO"};' there, when $path_info > wasn't global variable. Any advantages to having it global? > > (This is just a minor thing, not worth resending patch over, I think). As I mentioned above, just sparing a double assignment for something that is going to be checked anyway both at the beginning and here. And it can also have other uses, potentially. >> + # find which part of PATH_INFO is project >> + my $project = $path_info; > > Hmmm... now $project is local (lexically) here. Yes, itt's only used temporarily here, to see if a proper $project can be defined. It gets redefined outside. It just made sense to name it like this 8-) >> + $project =~ s,/+$,,; >> + while ($project && !check_head_link("$projectroot/$project")) { >> + $project =~ s,/*[^/]*$,,; >> + } >> + # validate project >> + $project = validate_project($project); > > I'm not sure if it is worth worrying over, but I think you repeat > check_head_link() check here. > > [After examining code further]. But I think you do double validation; > once you do it here, and once you do it copying to global variables > such as $action or $project, and double checking check_head_link() > won't be easy to avoid; fortunately it is cheap filesystem-level check > (might be slow only when stat is extremely slow, and is not cached). I know. This is actually the reason why I had interleaved path_info definition and global validation in my previous version of the patch. The big issue here is that path_info evaluation _needs_ (partial) validation. A possible alternative could be to only put validated parameters into %input_params. This would completely separate the validation for cgi and path_info (modulo shared subs). Of course, the check_head_link would still be repeated inside evaluate_path_info, but the other params could skip a double validation. >> + return unless $project; >> + $input_params{'project'} = $project; >> + >> + # do not change any parameters if an action is given using the query string >> + return if $input_params{'action'}; >> + $path_info =~ s,^\Q$project\E/*,,; > > Hmmm... I wonder if it is good idea. It was done in commit 645927c > (gitweb: fix warnings in PATH_INFO code and add export_ok/strict_export) > by Matthias Lederhofer, but I don't see why we do not remove project > from path_info if action is set. But this is belongs probably to > independent code cleanup (if it is needed), so don't worry about that. path_info information is simply discarded if action was already defined, because it was assumed that path_info was only used for default action. it's something that might be retaught with the rest of the path_info enhancements. > Perhaps it would be good to add empty line here before beginning of > 'hash' and 'hash_base:file_name' handling. Can do. >> + my ($refname, $pathname) = split(/:/, $path_info, 2); >> + if (defined $pathname) { >> + # we got "project.git/branch:filename" or "project.git/branch:dir/" >> + # we could use git_get_type(branch:pathname), but it needs $git_dir > > Additionally git_get_type(<extended sha1>) is additional call to git > (additional fork) currently; that might change with gitweb caching code, > which uses permanent connection to 'git cat-file --batch/--batch-check' > for that. I can add that to the comment. >> + $pathname =~ s,^/+,,; >> + if (!$pathname || substr($pathname, -1) eq "/") { >> + $input_params{'action'} = "tree"; >> + $pathname =~ s,/$,,; >> + } else { >> + $input_params{'action'} = "blob_plain"; >> + } >> + $input_params{'hash_base'} ||= validate_refname($refname); >> + $input_params{'file_name'} ||= validate_pathname($pathname); >> + } elsif (defined $refname) { >> + # we got "project.git/branch" >> + $input_params{'action'} = "shortlog"; >> + $input_params{'hash'} ||= validate_refname($refname); >> + } >> +} > > Nice defensive programming with '||=' here for setting %input_params. > > [After examining code further]. But I think you do double validation; > see comment below. [see above] >> +evaluate_path_info(); >> + >> +our $action = $input_params{'action'}; >> if (defined $action) { >> - if ($action =~ m/[^0-9a-zA-Z\.\-_]/) { >> + if (!validate_action($action)) { >> die_error(400, "Invalid action parameter"); >> } >> } > > Hmm... don't you do double validation now? Once in evaluate_path_info, > and once copying to global variables like $action? > > Very nice moving param validation for 'a'/'action' parameter to > validate_action() subroutine. As I mentioned, this could be the solution to the double validation, I'll just validate the cgi params separately from the path_info ones, and only embed validated parameters in the %input_params hash >> +our @extra_options = @{$input_params{'extra_options'}}; >> +# @extra_options is always defined, since it can only be (currently) set from >> +# CGI, and $cgi->param() returns the empty array in array context > > ...if param was not set. Ah, yes. I'll add it. >> if ($params{-replay}) { >> - while (my ($name, $symbol) = each %mapping) { >> + while (my ($name, $symbol) = each %cgi_param_mapping) { >> if (!exists $params{$name}) { >> - # to allow for multivalued params we use arrayref form >> - $params{$name} = [ $cgi->param($symbol) ]; >> + $params{$name} = $input_params{$name}; >> } >> } >> } > > Nice cleanup. Well, one would expect, given that this was the 'trigger' ;-) I'll try to whip up a revised patch ASAP. -- Giuseppe "Oblomov" Bilotta -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html