Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Oct 21, 2008 at 6:44 PM, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
> I like the idea behind this patch, to enable to use path_info for as
> much gitweb parameters as possible.  After this patch series the only
> parameters which wouldn't be possible to represent in path_info would
> be:
>  * @extra_options ('opt') multi-valued parameter, used to pass
>   thinks like '--no-merges', which cannot be fit in the "simplified"
>   list-like (as opposed to hash-like query string) path_info URL.
>  * $searchtype ('st') and $searchtext ('s') etc. parameters, which
>   are generated by HTML form, and are naturally generated in query
>   string format.
>  * $page ('pg') parameter, which could theoretically be added as last
>   part of path_info URL, for example $project/next/2/... if not for
>   pesky $project/history/next:/Documentation/2/ where you cannot be
>   sure that having /<number>/ at the end is rare.
>  * $order ('o') parameter, which would be hard to fit in path_info,
>   with its limitation of parameters being specified by position.
>   Or even next to impossible.
>  * 'by_tag'...
>
> But I'd rather have this patch series to be in separate thread...

Yes, a posteriori I think it's better too. I'll resend the 5 path_info
patches with the minor stylistic corrections you suggested, and send
these 3 separately.


> On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
>
>> We parse requests for $project/snapshot/$head.$sfx as equivalent to
>> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known
>> (although not necessarily supported) snapshot formats (or its default
>> suffix).
>>
>> The filename for the resulting package preserves the requested
>> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
>> gives a .tar.gz), although for obvious reasons it doesn't preserve the
>> basename (git/snapshot/next.tgz returns a file names git-next.tgz).
>
> That is a bit of difference from sf=<format> in CGI query string, where
> <format> is always a name of a format (for example 'tgz' or 'tbz2'),
> and actual suffix is defined in %known_snapshot_formats (for example
> '.tar.gz' and '.tar.bz2' respectively).  Now you can specify snapshot
> format either either by its name, for example 'tgz' (which is simple
> lookup in hash) which result in proposed filename with '.tgz' suffix,
> or you can specify suffix, for example 'tar.gz' (which requires
> searching through all hash) which result in proposed filename with
> '.tar.gz' suffix.
>
> This is a bit of inconsistency; to be consistent with how we handle
> 'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in
> snapshot filename.  This would also cover currently purely theoretical
> case when different snapshot formats (for example 'tgz' and 'tgz9')
> would use the same snapshot suffix (extension), but differ for example
> in parameters passed to compressor (for example '-9' or '--best' in
> the 'tgz9' case).
>
> On the other hand one would expect that when URL which looks like
> URL to snapshot ends with '.$sfx', then filename for snapshot would
> also end with '.$sfx'.
>
> This certainly requires some further thoughts.

What I decided was to set gitweb to always produce links with the
suffix (.e.g .tar.gz), but I saw no particular reason not to accept
the shorter version which is (1) commonly used as a suffix as well and
(2) happens to be the actual format key used by gitweb.

A different, possibly cleaner approach, but a more extensive change,
would be to have each format describe a list of suffixes, defaulting
to the first one on creation by identifying all of them. This is more
invasive because all of the uses of {'suffix'} have to be replaced
with {'suffix'}[0], or something like that (maybe we could add a
separate key 'other_suffixes' instead?)


>> This introduces a potential case for ambiguity if a project has a head
>> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
>> sf CGI parameter is not present; however, gitweb only produces URLs with
>> the sf parameter, so this is only a potential issue for hand-coded URLs
>> for extremely unusual project.
>
> I think you wanted to say here "_currently_ produces URLs with the 'sf'
> parameter" as the next patch in series changes this.

Ah yes, good point.

>> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@xxxxxxxxx>
>> ---
>>
>> I had second thoughts on this. Now we always look for the snapshot extension if
>> the sf CGI parameter is missing, even if the project has a head that matches
>> the full pseudo-refname $head.$sfx.
>>
>> The reason for this is that (1) there is no ambiguity for gitweb-generated
>> URLs (2) the only URLs that could fail are hand-made URLs for extremely
>> unusual projects and (3) it allows us to set gitweb up to generate
>> (unambiguous) URLs without the sf CGI parameter.
>
> This is also simpler and cheaper solution.

That, too 8-)

>> This also means that I can add 3 patches to the series, instead of just one:
>> * patch #6 that parses the new format
>> * patch #7 that generates the new URLs
>> * patch #8 for some code refactoring
>
> Now, I haven't yet read the last patch in series, so I don't know if
> it is independent refactoring, making sense even before patches named
> #6 and #7 here, or is it connected with searching for snapshot format
> by suffix it uses.  If the former, it should be done upfront, as it
> shouldn't need discussion, and being easier to be accepted into git.git.
> If the latter, then it should probably be folded (squashed) into #6,
> first patch in the series.

In fact, patch #8 can be written independently of the other too, and
would provide a significant speed benefit for generation of pages with
lots of 'snapshot' links: what it does is just to make the 'supported
formats' array global, preparing it only once instead of re-preparing
it every time a snapshot link is created.

>>  gitweb/gitweb.perl |   34 ++++++++++++++++++++++++++++++++++
>>  1 files changed, 34 insertions(+), 0 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 99c8c20..e9e9e60 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -609,6 +609,40 @@ sub evaluate_path_info {
>>                       $input_params{'hash_parent'} ||= $parentrefname;
>>               }
>>       }
>> +
>> +     # for the snapshot action, we allow URLs in the form
>> +     # $project/snapshot/$hash.ext
>> +     # where .ext determines the snapshot and gets removed from the
>> +     # passed $refname to provide the $hash.
>> +     #
>> +     # To be able to tell that $refname includes the format extension, we
>> +     # require the following two conditions to be satisfied:
>> +     # - the hash input parameter MUST have been set from the $refname part
>> +     #   of the URL (i.e. they must be equal)
>
> This means no "$project/.tgz?h=next", isn't it?

Right.

>> +     # - the snapshot format MUST NOT have been defined already
>
> I would add "which means that 'sf' parameter is not set in URL", or
> something like that as the last line of above comment.

Good idea. I'll make it an 'e.g.' to keep the comment valid for future
additional parameter evaluation such as command-line input.

> I like that the code is so well commented, by the way.

Thanks.

>> +     if ($input_params{'action'} eq 'snapshot' && defined $refname &&
>> +             $refname eq $input_params{'hash'} &&
>
> Minor nit.
>
> I would use here (the question of style / better readability):
>
> +       if ($input_params{'action'} eq 'snapshot' &&
> +                defined $refname && $refname eq $input_params{'hash'} &&
>
> to have both conditions about $refname in the same line.

Yes, it'd look much better.

>> +             !defined $input_params{'snapshot_format'}) {
>> +             # We loop over the known snapshot formats, checking for
>> +             # extensions. Allowed extensions are both the defined suffix
>> +             # (which includes the initial dot already) and the snapshot
>> +             # format key itself, with a prepended dot
>> +             while (my ($fmt, %opt) = each %known_snapshot_formats) {
>> +                     my $hash = $refname;
>> +                     my $sfx;
>> +                     $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
>> +                     next unless $sfx = $1;
>> +                     # a valid suffix was found, so set the snapshot format
>> +                     # and reset the hash parameter
>> +                     $input_params{'snapshot_format'} = $fmt;
>> +                     $input_params{'hash'} = $hash;
>> +                     # we also set the format suffix to the one requested
>> +                     # in the URL: this way a request for e.g. .tgz returns
>> +                     # a .tgz instead of a .tar.gz
>> +                     $known_snapshot_formats{$fmt}{'suffix'} = $sfx;
>> +                     last;
>> +             }
>
> I'm not sure if it worth (see comment at the beginning of this mail)
> adding this code, or just allow $sfx to be snapshot _name_ (key in
> %known_snapshot_formats hash).
>
> Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx}
> exists (assuming that snapshot format names does not contain '.').
>
> If we decide to go more complicated route, then refactoring it in such
> a way that suffixes are also keys to %known_snapshot_formats would be
> preferred... err, sorry, not so simple.  But refactoring this check
> into separate subroutine (as I think last patch in series does) would
> be good idea.

See comments above.

> Also, I'd rather you checked if the $refname part contains '.' for it
> to even consider that it can be suffix.

Ah, good idea.

-- 
Giuseppe "Oblomov" Bilotta
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux