Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I like the idea behind this patch, to enable to use path_info for as
much gitweb parameters as possible.  After this patch series the only
parameters which wouldn't be possible to represent in path_info would
be: 
 * @extra_options ('opt') multi-valued parameter, used to pass
   thinks like '--no-merges', which cannot be fit in the "simplified"
   list-like (as opposed to hash-like query string) path_info URL.
 * $searchtype ('st') and $searchtext ('s') etc. parameters, which
   are generated by HTML form, and are naturally generated in query
   string format.
 * $page ('pg') parameter, which could theoretically be added as last
   part of path_info URL, for example $project/next/2/... if not for
   pesky $project/history/next:/Documentation/2/ where you cannot be
   sure that having /<number>/ at the end is rare.
 * $order ('o') parameter, which would be hard to fit in path_info,
   with its limitation of parameters being specified by position.
   Or even next to impossible.
 * 'by_tag'...

But I'd rather have this patch series to be in separate thread...


On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:

> We parse requests for $project/snapshot/$head.$sfx as equivalent to
> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known
> (although not necessarily supported) snapshot formats (or its default
> suffix).
> 
> The filename for the resulting package preserves the requested
> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
> gives a .tar.gz), although for obvious reasons it doesn't preserve the
> basename (git/snapshot/next.tgz returns a file names git-next.tgz).

That is a bit of difference from sf=<format> in CGI query string, where
<format> is always a name of a format (for example 'tgz' or 'tbz2'), 
and actual suffix is defined in %known_snapshot_formats (for example
'.tar.gz' and '.tar.bz2' respectively).  Now you can specify snapshot
format either either by its name, for example 'tgz' (which is simple
lookup in hash) which result in proposed filename with '.tgz' suffix,
or you can specify suffix, for example 'tar.gz' (which requires
searching through all hash) which result in proposed filename with
'.tar.gz' suffix.

This is a bit of inconsistency; to be consistent with how we handle
'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in
snapshot filename.  This would also cover currently purely theoretical
case when different snapshot formats (for example 'tgz' and 'tgz9')
would use the same snapshot suffix (extension), but differ for example
in parameters passed to compressor (for example '-9' or '--best' in
the 'tgz9' case).

On the other hand one would expect that when URL which looks like
URL to snapshot ends with '.$sfx', then filename for snapshot would
also end with '.$sfx'.

This certainly requires some further thoughts.

> 
> This introduces a potential case for ambiguity if a project has a head
> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
> sf CGI parameter is not present; however, gitweb only produces URLs with
> the sf parameter, so this is only a potential issue for hand-coded URLs
> for extremely unusual project.

I think you wanted to say here "_currently_ produces URLs with the 'sf'
parameter" as the next patch in series changes this.

> 
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@xxxxxxxxx>
> ---
> 
> I had second thoughts on this. Now we always look for the snapshot extension if
> the sf CGI parameter is missing, even if the project has a head that matches
> the full pseudo-refname $head.$sfx.
> 
> The reason for this is that (1) there is no ambiguity for gitweb-generated
> URLs (2) the only URLs that could fail are hand-made URLs for extremely
> unusual projects and (3) it allows us to set gitweb up to generate
> (unambiguous) URLs without the sf CGI parameter.

This is also simpler and cheaper solution.

> 
> This also means that I can add 3 patches to the series, instead of just one:
> * patch #6 that parses the new format
> * patch #7 that generates the new URLs
> * patch #8 for some code refactoring

Now, I haven't yet read the last patch in series, so I don't know if
it is independent refactoring, making sense even before patches named
#6 and #7 here, or is it connected with searching for snapshot format
by suffix it uses.  If the former, it should be done upfront, as it
shouldn't need discussion, and being easier to be accepted into git.git.
If the latter, then it should probably be folded (squashed) into #6,
first patch in the series.

> 
>  gitweb/gitweb.perl |   34 ++++++++++++++++++++++++++++++++++
>  1 files changed, 34 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99c8c20..e9e9e60 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -609,6 +609,40 @@ sub evaluate_path_info {
>  			$input_params{'hash_parent'} ||= $parentrefname;
>  		}
>  	}
> +
> +	# for the snapshot action, we allow URLs in the form
> +	# $project/snapshot/$hash.ext
> +	# where .ext determines the snapshot and gets removed from the
> +	# passed $refname to provide the $hash.
> +	#
> +	# To be able to tell that $refname includes the format extension, we
> +	# require the following two conditions to be satisfied:
> +	# - the hash input parameter MUST have been set from the $refname part
> +	#   of the URL (i.e. they must be equal)

This means no "$project/.tgz?h=next", isn't it?

> +	# - the snapshot format MUST NOT have been defined already

I would add "which means that 'sf' parameter is not set in URL", or
something like that as the last line of above comment.

I like that the code is so well commented, by the way.

> +	if ($input_params{'action'} eq 'snapshot' && defined $refname &&
> +		$refname eq $input_params{'hash'} &&

Minor nit.

I would use here (the question of style / better readability):

+	if ($input_params{'action'} eq 'snapshot' &&
+		 defined $refname && $refname eq $input_params{'hash'} &&

to have both conditions about $refname in the same line.

> +		!defined $input_params{'snapshot_format'}) {
> +		# We loop over the known snapshot formats, checking for
> +		# extensions. Allowed extensions are both the defined suffix
> +		# (which includes the initial dot already) and the snapshot
> +		# format key itself, with a prepended dot
> +		while (my ($fmt, %opt) = each %known_snapshot_formats) {
> +			my $hash = $refname;
> +			my $sfx;
> +			$hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
> +			next unless $sfx = $1;
> +			# a valid suffix was found, so set the snapshot format
> +			# and reset the hash parameter
> +			$input_params{'snapshot_format'} = $fmt;
> +			$input_params{'hash'} = $hash;
> +			# we also set the format suffix to the one requested
> +			# in the URL: this way a request for e.g. .tgz returns
> +			# a .tgz instead of a .tar.gz
> +			$known_snapshot_formats{$fmt}{'suffix'} = $sfx;
> +			last;
> +		}

I'm not sure if it worth (see comment at the beginning of this mail)
adding this code, or just allow $sfx to be snapshot _name_ (key in
%known_snapshot_formats hash).

Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx}
exists (assuming that snapshot format names does not contain '.').

If we decide to go more complicated route, then refactoring it in such
a way that suffixes are also keys to %known_snapshot_formats would be
preferred... err, sorry, not so simple.  But refactoring this check
into separate subroutine (as I think last patch in series does) would
be good idea.


Also, I'd rather you checked if the $refname part contains '.' for it
to even consider that it can be suffix.

> +	}
>  }
>  evaluate_path_info();
>  
> -- 
> 1.5.6.5
> 
> 

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux