Re: [PATCHv4] gitweb: refactor input parameters parse/validation

"Giuseppe Bilotta" <giuseppe.bilotta@xxxxxxxxx> · Fri, 3 Oct 2008 09:24:35 +0200

On Fri, Oct 3, 2008 at 3:36 AM, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
>
>> Since input parameters can now be obtained both from CGI parameters and
>> PATH_INFO, we would like most of the code to be agnostic about the way
>> parameters were retrieved.
>
> I'd prefer that this cleanup/refactoring patch was _first_ patch in
> the series, as we were able to obtain parameters both from CGI query
> parameters and from PATH_INFO ($project, $hash or $hash_base+$file_name)
> _before_ first patch in this series.  So it correct not only issue
> introduced by first patch (and fixed somewhat there), but what was
> outstanding (but rare because gitweb did not generate such links)
> issue.

It's true that the patch makes sense regardless of the rest of the
path_info patch, indeed.

>> We thus collect all the parameters into the new %input_params hash,
>> removing the need for ad-hoc kludgy code in href().
>
> Alternate solution would be push data from PATH_INFO in query params
> data using (for example)
>
>  $cgi->param('a', $action);
>
> or, naming parameters explicitely
>
>  $cgi->param(-name=>'a', -value=>$action);
>
> This avoids need for additional variable, reuses current code, and
> nicely sidesteps issue whether to use long names for keys ('action',
> 'file_name') or short ones from CGI query ('a', 'f').
>
>
> It would probably has to be at least partially written to check which
> of those two solutions (%input_params or $cgi->param('a', $a)))
> is better.

If we really don't want to care where the parameters came from *at
all*, writing on $cgi->param is likely to be the cleanest solution.

>> As much of the
>> parameters validation code is now shared between CGI and PATH_INFO,
>> although this requires PATH_INFO parsing to be interleaved into the main
>> code instead of being factored out into its own routine.
>
> I'm not sure if it is worth it then to unify parameters validation,
> with such drawback.

Yeah, I don't like it very much either. But it does spare a little on
the validation. OTOH, the amount we spare is not extraordinary, so
it's probably not worth the spaghettization of the code ...

>>  # if we're called with PATH_INFO, we have to strip that
>> -# from the URL to find our real URL
>> -if (my $path_info = $ENV{"PATH_INFO"}) {
>> +# from the URL to find our real URL. PATH_INFO is kept
>> +# as it's used later on for parameter extraction
>> +my $path_info = $ENV{"PATH_INFO"};
>> +if ($path_info) {
>>       $my_url =~ s,\Q$path_info\E$,,;
>>       $my_uri =~ s,\Q$path_info\E$,,;
>>  }
>
> This might be separate patch, if you wanted to increase your commit
> count ;-)  More seriously I think it should be at least briefly
> mentioned in commit message (make $path_info global).

Note however that the path_info evaluation is _destructive_, so after
evaluation is complete we don't have much of it left.

[snip]

> Nice, although I'm not sure if [%@]cgi_param_mapping has to global.
> If we use long parameters names as keys, I think it has to, somewhat.
> See also comment below.

>> +# fill %input_params with the CGI parameters. All values except for 'opt'
>> +# should be single values, but opt can be an array. We should probably
>> +# build an array of parameters that can be multi-valued, but since for the time
>> +# being it's only this one, we just single it out
>> +while (my ($name, $symbol) = each %cgi_param_mapping) {
>> +     if ($symbol eq 'opt') {
>> +             $input_params{$name} = [$cgi->param($symbol)];
>> +     } else {
>> +             $input_params{$name} = $cgi->param($symbol);
>>       }
>>  }
>
> If it was chosen to use short (CGI query) parameter names, the above
> loop could be replaced by simple
>
>  %input_params = $cgi->Vars;
>
> or to be more exact, if we want to have multi-valued parameters stored
> as array ref
>
>  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;
>
>
> See CGI(3pm):
>
>    When using this [Vars], the thing you must watch out for are multivalued CGI
>    parameters.  Because a hash cannot distinguish between scalar and list con-
>    text, multivalued parameters will be returned as a packed string, separated
>    by the "\0" (null) character.  You must split this packed string in order
>    to get at the individual values.

Ah, an interesting alternative. This would make parameter copying a
one-liner, almost as good as just using $cgi->param for everything :)

>> -# parameters which are pathnames
>> -our $project = $cgi->param('p');
>> +# next, we want to parse PATH_INFO (which was already stored in $path_info at
>> +# the beginning). This is a little hairy because PATH_INFO parsing needs some
>> +# form of parameter validation, so we interleave parsing and validation.
>
> I don't think it is a good idea. In my opinion, for my taste, it would
> be better to separate evaluating path_info from the rest.
>
> We could instead introduce convention that if variable (like $project)
> is set, then it is assumed to be validated; if it is present only in
> the %input_params hash, then it has to be validated.
>
>
> On the other hand this ordering, first by parameter, then by method
> of extraction could be seem quite equally valid.  Nevertheless I think
> that previous flow with separate evaluate_path_info() and what should
> be evaluate_CGI_query() has better encapsulation.
>
>> +#
>> +# The accepted PATH_INFO syntax is $project/$action/$hash or
>> +# $project/$action/$hash_base:$file_name, where $action may be missing (mostly for
>> +# backwards compatibility), so we need to parse and validate the parameters in
>> +# this same order.
>> +
>> +# clear $path_info of any leading /
>> +$path_info =~ s,^/+,,;
>> +
>> +our $project = $input_params{'project'};
>> +if ($path_info && !defined $project) {
>> +     # if $project was not defined by CGI, we try to extract it from
>> +     # $path_info
>> +     $project = $path_info;
>> +     $project =~ s,/+$,,;
>> +     while ($project && !check_head_link("$projectroot/$project")) {
>> +             $project =~ s,/*[^/]*$,,;
>> +     }
>> +     $input_params{'project'} = $project;
>> +} else {
>> +     # otherwise, we suppress $path_info parsing altogether
>> +     $path_info = undef;
>> +}
>> +
>> +# project name validation
>>  if (defined $project) {
>>       if (!validate_pathname($project) ||
>>           !(-d "$projectroot/$project") ||
>
> Note that this code does less checking if $project is in path_info than
> for the case where it is set by CGI query. Perhaps there should be base
> fast check in a loop, and more extensive test later.

Uh ... isn't this exactly what's happening? In the loop we just gobble
until we find a git dir. Validation is then done, and it's the _same_
validation for both cases. Why do you say that path_info $project is
less checked?

>> @@ -408,16 +506,66 @@ if (defined $project) {
>>               undef $project;
>>               die_error(404, "No such project");
>>       }
>> +
>> +     # we purge the $project name from the $path_info, preparing it for
>> +     # subsequent parameters extraction
>> +     $path_info =~ s,^\Q$project\E/*,, if defined $path_info;
>> +} else {
>> +     # we also suppress $path_info parsing if no project was defined
>> +     $path_info = undef;
>> +}
>
> In evaluate_path_info it was simply 'return if...'; here with mentioned
> interleaving it is harder and a bit hacky.

I know.

>>       $params{'project'} = $project unless exists $params{'project'};
>>
>>       if ($params{-replay}) {
>> -             while (my ($name, $symbol) = each %mapping) {
>> -                     if (!exists $params{$name}) {
>> -                             # the parameter we want to recycle may be either part of the
>> -                             # list of CGI parameter, or recovered from PATH_INFO
>> -                             if ($cgi->param($symbol)) {
>> -                                     # to allow for multivalued params we use arrayref form
>> -                                     $params{$name} = [ $cgi->param($symbol) ];
>> -                             } else {
>> -                                     no strict 'refs';
>> -                                     $params{$name} = $$name if $$name;
>> -                             }
>> -                     }
>> +             while (my ($name, $val) = each %input_params) {
>> +                     $params{$name} = $input_params{$name}
>> +                             unless (exists $params{$name});
>
> Very nice, short code.  Should be something like that from
> the very beginning.

Ok, I'll try working up a patch for params merging before any
path_info extensions.

-- 
Giuseppe "Oblomov" Bilotta
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html