Re: [PATCHv4] gitweb: refactor input parameters parse/validation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 3 Oct 2008, Giuseppe Bilotta wrote:
> On Fri, Oct 3, 2008 at 3:36 AM, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

>>> We thus collect all the parameters into the new %input_params hash,
>>> removing the need for ad-hoc kludgy code in href().
>>
>> Alternate solution would be push data from PATH_INFO in query params
>> data using (for example)
>>
>>  $cgi->param('a', $action);
>>
>> or, naming parameters explicitely
>>
>>  $cgi->param(-name=>'a', -value=>$action);
>>
>> This avoids need for additional variable, reuses current code, and
>> nicely sidesteps issue whether to use long names for keys ('action',
>> 'file_name') or short ones from CGI query ('a', 'f').
>>
>>
>> It would probably has to be at least partially written to check which
>> of those two solutions (%input_params or $cgi->param('a', $a)))
>> is better.
> 
> If we really don't want to care where the parameters came from *at
> all*, writing on $cgi->param is likely to be the cleanest solution.

I now think that %input_params is a better solution, even if we won't
be implementing -true_replay / -faithful_replay option, where options
gotten in path_info would be passed in path_info, and options passed
in query string would be passed as CGI params (even if those params
could be encoded in path_info).
 
One of the advantages is that having @cgi_param_mapping near the top
provides opportunity to describe short CGI query options.

Another advantage is that with %input_params it is easy to find if
parameter is multivalued, or is meant to be multivalued: just check
ref($input_params{long_name}); well, perhaps not _that_ easy, but...

>>> As much of the
>>> parameters validation code is now shared between CGI and PATH_INFO,
>>> although this requires PATH_INFO parsing to be interleaved into the main
>>> code instead of being factored out into its own routine.
>>
>> I'm not sure if it is worth it then to unify parameters validation,
>> with such drawback.
> 
> Yeah, I don't like it very much either. But it does spare a little on
> the validation. OTOH, the amount we spare is not extraordinary, so
> it's probably not worth the spaghettization of the code ...

Avoiding repetition can be done in different ways, for example we can
have gitweb assume that if value is already in params variable
($project, $action, $hash,...) it is already validated, and if it is
only in %input_params it is not.  Add to that for example putting
common part of project path checks into validate_project and you have
all the advantages (no code repetition, spared validation) without
drawback of harder to maintain spaghetti-like code.

>>> +# fill %input_params with the CGI parameters. All values except for 'opt'
>>> +# should be single values, but opt can be an array. We should probably
>>> +# build an array of parameters that can be multi-valued, but since for the time
>>> +# being it's only this one, we just single it out
>>> +while (my ($name, $symbol) = each %cgi_param_mapping) {
>>> +     if ($symbol eq 'opt') {
>>> +             $input_params{$name} = [$cgi->param($symbol)];
>>> +     } else {
>>> +             $input_params{$name} = $cgi->param($symbol);
>>>       }
>>>  }
>>
>> If it was chosen to use short (CGI query) parameter names, the above
>> loop could be replaced by simple
>>
>>  %input_params = $cgi->Vars;
>>
>> or to be more exact, if we want to have multi-valued parameters stored
>> as array ref
>>
>>  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;
>>
>>
>> See CGI(3pm):
>>
>>    When using this [Vars], the thing you must watch out for are multivalued CGI
>>    parameters.  Because a hash cannot distinguish between scalar and list con-
>>    text, multivalued parameters will be returned as a packed string, separated
>>    by the "\0" (null) character.  You must split this packed string in order
>>    to get at the individual values.
> 
> Ah, an interesting alternative. This would make parameter copying a
> one-liner, almost as good as just using $cgi->param for everything :)

Hmmmm...
 

[cut]
>> Note that this code does less checking if $project is in path_info than
>> for the case where it is set by CGI query. Perhaps there should be base
>> fast check in a loop, and more extensive test later.
> 
> Uh ... isn't this exactly what's happening? In the loop we just gobble
> until we find a git dir. Validation is then done, and it's the _same_
> validation for both cases. Why do you say that path_info $project is
> less checked?

For project from query string we now have:

  !validate_pathname($project) ||
  !(-d "$projectroot/$project") ||
  !check_head_link("$projectroot/$project") ||
  ($export_ok && !(-e "$projectroot/$project/$export_ok")) ||
  ($strict_export && !project_in_list($project))

For project from path_info we now have:

  while ($project && !check_head_link("$projectroot/$project")) {
	$project =~ s,/*[^/]*$,,;
  }
  ...
  !validate_pathname($project) ||
  ($export_ok && !-e "$projectroot/$project/$export_ok") ||
  ($strict_export && !project_in_list($project))
  
It is almost the same; I have thought that they differ more. Perhaps
some of above code could be refactored into validate_project(), or
project_ok() subroutine.

>>>       $params{'project'} = $project unless exists $params{'project'};
>>>
>>>       if ($params{-replay}) {
>>> -             while (my ($name, $symbol) = each %mapping) {
>>> -                     if (!exists $params{$name}) {
>>> -                             # the parameter we want to recycle may be either part of the
>>> -                             # list of CGI parameter, or recovered from PATH_INFO
>>> -                             if ($cgi->param($symbol)) {
>>> -                                     # to allow for multivalued params we use arrayref form
>>> -                                     $params{$name} = [ $cgi->param($symbol) ];
>>> -                             } else {
>>> -                                     no strict 'refs';
>>> -                                     $params{$name} = $$name if $$name;
>>> -                             }
>>> -                     }
>>> +             while (my ($name, $val) = each %input_params) {
>>> +                     $params{$name} = $input_params{$name}
>>> +                             unless (exists $params{$name});
>>
>> Very nice, short code.  Should be something like that from
>> the very beginning.
> 
> Ok, I'll try working up a patch for params merging before any
> path_info extensions.

Perhaps also put query string handling into evaluate_CGI_query(), or
evaluate_query_string(), similar to evaluate_path_info()?

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux