On Tue, Oct 21, 2008 at 6:44 PM, Jakub Narebski <jnareb@xxxxxxxxx> wrote: > I like the idea behind this patch, to enable to use path_info for as > much gitweb parameters as possible. After this patch series the only > parameters which wouldn't be possible to represent in path_info would > be: > * @extra_options ('opt') multi-valued parameter, used to pass > thinks like '--no-merges', which cannot be fit in the "simplified" > list-like (as opposed to hash-like query string) path_info URL. > * $searchtype ('st') and $searchtext ('s') etc. parameters, which > are generated by HTML form, and are naturally generated in query > string format. > * $page ('pg') parameter, which could theoretically be added as last > part of path_info URL, for example $project/next/2/... if not for > pesky $project/history/next:/Documentation/2/ where you cannot be > sure that having /<number>/ at the end is rare. > * $order ('o') parameter, which would be hard to fit in path_info, > with its limitation of parameters being specified by position. > Or even next to impossible. > * 'by_tag'... > > But I'd rather have this patch series to be in separate thread... Yes, a posteriori I think it's better too. I'll resend the 5 path_info patches with the minor stylistic corrections you suggested, and send these 3 separately. > On Sun, 19 Oct 2008, Giuseppe Bilotta wrote: > >> We parse requests for $project/snapshot/$head.$sfx as equivalent to >> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known >> (although not necessarily supported) snapshot formats (or its default >> suffix). >> >> The filename for the resulting package preserves the requested >> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz >> gives a .tar.gz), although for obvious reasons it doesn't preserve the >> basename (git/snapshot/next.tgz returns a file names git-next.tgz). > > That is a bit of difference from sf=<format> in CGI query string, where > <format> is always a name of a format (for example 'tgz' or 'tbz2'), > and actual suffix is defined in %known_snapshot_formats (for example > '.tar.gz' and '.tar.bz2' respectively). Now you can specify snapshot > format either either by its name, for example 'tgz' (which is simple > lookup in hash) which result in proposed filename with '.tgz' suffix, > or you can specify suffix, for example 'tar.gz' (which requires > searching through all hash) which result in proposed filename with > '.tar.gz' suffix. > > This is a bit of inconsistency; to be consistent with how we handle > 'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in > snapshot filename. This would also cover currently purely theoretical > case when different snapshot formats (for example 'tgz' and 'tgz9') > would use the same snapshot suffix (extension), but differ for example > in parameters passed to compressor (for example '-9' or '--best' in > the 'tgz9' case). > > On the other hand one would expect that when URL which looks like > URL to snapshot ends with '.$sfx', then filename for snapshot would > also end with '.$sfx'. > > This certainly requires some further thoughts. What I decided was to set gitweb to always produce links with the suffix (.e.g .tar.gz), but I saw no particular reason not to accept the shorter version which is (1) commonly used as a suffix as well and (2) happens to be the actual format key used by gitweb. A different, possibly cleaner approach, but a more extensive change, would be to have each format describe a list of suffixes, defaulting to the first one on creation by identifying all of them. This is more invasive because all of the uses of {'suffix'} have to be replaced with {'suffix'}[0], or something like that (maybe we could add a separate key 'other_suffixes' instead?) >> This introduces a potential case for ambiguity if a project has a head >> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the >> sf CGI parameter is not present; however, gitweb only produces URLs with >> the sf parameter, so this is only a potential issue for hand-coded URLs >> for extremely unusual project. > > I think you wanted to say here "_currently_ produces URLs with the 'sf' > parameter" as the next patch in series changes this. Ah yes, good point. >> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@xxxxxxxxx> >> --- >> >> I had second thoughts on this. Now we always look for the snapshot extension if >> the sf CGI parameter is missing, even if the project has a head that matches >> the full pseudo-refname $head.$sfx. >> >> The reason for this is that (1) there is no ambiguity for gitweb-generated >> URLs (2) the only URLs that could fail are hand-made URLs for extremely >> unusual projects and (3) it allows us to set gitweb up to generate >> (unambiguous) URLs without the sf CGI parameter. > > This is also simpler and cheaper solution. That, too 8-) >> This also means that I can add 3 patches to the series, instead of just one: >> * patch #6 that parses the new format >> * patch #7 that generates the new URLs >> * patch #8 for some code refactoring > > Now, I haven't yet read the last patch in series, so I don't know if > it is independent refactoring, making sense even before patches named > #6 and #7 here, or is it connected with searching for snapshot format > by suffix it uses. If the former, it should be done upfront, as it > shouldn't need discussion, and being easier to be accepted into git.git. > If the latter, then it should probably be folded (squashed) into #6, > first patch in the series. In fact, patch #8 can be written independently of the other too, and would provide a significant speed benefit for generation of pages with lots of 'snapshot' links: what it does is just to make the 'supported formats' array global, preparing it only once instead of re-preparing it every time a snapshot link is created. >> gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++ >> 1 files changed, 34 insertions(+), 0 deletions(-) >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index 99c8c20..e9e9e60 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -609,6 +609,40 @@ sub evaluate_path_info { >> $input_params{'hash_parent'} ||= $parentrefname; >> } >> } >> + >> + # for the snapshot action, we allow URLs in the form >> + # $project/snapshot/$hash.ext >> + # where .ext determines the snapshot and gets removed from the >> + # passed $refname to provide the $hash. >> + # >> + # To be able to tell that $refname includes the format extension, we >> + # require the following two conditions to be satisfied: >> + # - the hash input parameter MUST have been set from the $refname part >> + # of the URL (i.e. they must be equal) > > This means no "$project/.tgz?h=next", isn't it? Right. >> + # - the snapshot format MUST NOT have been defined already > > I would add "which means that 'sf' parameter is not set in URL", or > something like that as the last line of above comment. Good idea. I'll make it an 'e.g.' to keep the comment valid for future additional parameter evaluation such as command-line input. > I like that the code is so well commented, by the way. Thanks. >> + if ($input_params{'action'} eq 'snapshot' && defined $refname && >> + $refname eq $input_params{'hash'} && > > Minor nit. > > I would use here (the question of style / better readability): > > + if ($input_params{'action'} eq 'snapshot' && > + defined $refname && $refname eq $input_params{'hash'} && > > to have both conditions about $refname in the same line. Yes, it'd look much better. >> + !defined $input_params{'snapshot_format'}) { >> + # We loop over the known snapshot formats, checking for >> + # extensions. Allowed extensions are both the defined suffix >> + # (which includes the initial dot already) and the snapshot >> + # format key itself, with a prepended dot >> + while (my ($fmt, %opt) = each %known_snapshot_formats) { >> + my $hash = $refname; >> + my $sfx; >> + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//; >> + next unless $sfx = $1; >> + # a valid suffix was found, so set the snapshot format >> + # and reset the hash parameter >> + $input_params{'snapshot_format'} = $fmt; >> + $input_params{'hash'} = $hash; >> + # we also set the format suffix to the one requested >> + # in the URL: this way a request for e.g. .tgz returns >> + # a .tgz instead of a .tar.gz >> + $known_snapshot_formats{$fmt}{'suffix'} = $sfx; >> + last; >> + } > > I'm not sure if it worth (see comment at the beginning of this mail) > adding this code, or just allow $sfx to be snapshot _name_ (key in > %known_snapshot_formats hash). > > Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx} > exists (assuming that snapshot format names does not contain '.'). > > If we decide to go more complicated route, then refactoring it in such > a way that suffixes are also keys to %known_snapshot_formats would be > preferred... err, sorry, not so simple. But refactoring this check > into separate subroutine (as I think last patch in series does) would > be good idea. See comments above. > Also, I'd rather you checked if the $refname part contains '.' for it > to even consider that it can be suffix. Ah, good idea. -- Giuseppe "Oblomov" Bilotta -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html