Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am sorry for being late with review of this patch.

On Sat, 26 Sep 2009, Mark Rada wrote:

> Teach gitweb how to produce nicer snapshot names by only using the
> short hash id.

A few questions (which I think should be answered in this commit message).

First, what was original behaviour of 'snapshot' action?  Did gitweb
always convert 'h' (hash) parameter to full SHA-1?

Second, do you preserve 'snapshot' behavior that it generated archive
which unpacks to the directory with the same name as basename of (proposed)
archive name?  I mean here that "repo-2cc6859e.tar.gz" unpacks to
"repo-2cc6859e/" directory.  I guess it does, but this should be
mentioned in the commit message.

>               If clients make requests using a tree-ish that is not a
> partial or full SHA-1 hash, then the short hash will also be appended
> to whatever they asked for.

Here example would be a good idea.  I guess this means that if one is
requesting for snapshot of 'next' or 'v1.6.0' of 'repo.git' project,
one would get 'repo-next-2cc6859.tar.gz' or 'repo-v1.6.0-2cc6859.tar.gz'
as [proposed] snapshot filename.

> 
> This also includes tests cases for t9502-gitweb-standalone-parse-output.

I am not sure if it shouldn't be rather t9502-gitweb-standalone-snapshot
test; see comments below for whys.

> 
> Signed-off-by: Mark Rada <marada@xxxxxxxxxxxx>
> ---
> 
> 
> 	Changes since v4:
> 		- moved git_get_full_hash into this commit
> 		- changed test case format, suggested by Junio
> 		- explicity request at least a length of 7 for short hashes
> 
> 
>  gitweb/gitweb.perl                        |   40 +++++++++++++++--
>  t/t9502-gitweb-standalone-parse-output.sh |   67 +++++++++++++++++++++++++++++
>  2 files changed, 103 insertions(+), 4 deletions(-)
>  create mode 100644 t/t9502-gitweb-standalone-parse-output.sh
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 8d4a2ae..bc132a5 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1983,14 +1983,39 @@ sub quote_command {
>  
>  # get HEAD ref of given project as hash
>  sub git_get_head_hash {
> +	return git_get_full_hash(shift, 'HEAD');
> +}

Good.  This means that we don't need to change code that do not use
new feature (new subroutines).  It's nice to cater to backward 
compatibility, especially if it is so low cost as this one.

> +
> +sub git_get_full_hash {
>  	my $project = shift;
> +	my $hash = shift;

I think it might be good idea here to default to 'HEAD', i.e.

  +	my $hash = shift || 'HEAD';

>  	my $o_git_dir = $git_dir;
>  	my $retval = undef;
>  	$git_dir = "$projectroot/$project";
> -	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
> -		my $head = <$fd>;
> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) {
> +		$hash = <$fd>;
>  		close $fd;
> -		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) {
> +			$retval = $1;
> +		}

I guess that you use "$retval = $1;" instead of just "$retval = $hash;"
because of similarities with git_get_short_hash, isn't it?  Or it is just
following earlier code?

> +	}
> +	if (defined $o_git_dir) {
> +		$git_dir = $o_git_dir;
> +	}
> +	return $retval;
> +}
> +
> +# try and get a shorter hash id
> +sub git_get_short_hash {
> +	my $project = shift;
> +	my $hash = shift;
> +	my $o_git_dir = $git_dir;
> +	my $retval = undef;
> +	$git_dir = "$projectroot/$project";
> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) {
> +		$hash = <$fd>;
> +		close $fd;
> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) {
>  			$retval = $1;
>  		}
>  	}

Note that git_get_full_hash (which additionally does verification) and
git_get_short_hash share much of code.  Perhaps it might be worth to
avoid code duplication somehow?  On the other hand it might be not worth
to complicate code by trying to extract common parts here...

> @@ -5203,6 +5228,13 @@ sub git_snapshot {
>  		die_error(400, 'Object is not a tree-ish');
>  	}
>  
> +
> +	my $full_hash = git_get_full_hash($project, $hash);
> +	if ($full_hash =~ /^$hash/) {
> +		$hash = git_get_short_hash($project, $hash);
> +	} else {
> +		$hash .= '-' . git_get_short_hash($project, $hash);
> +	}

I think we might want to avoid calling git_get_full_hash (and extra call
to "git rev-parse" command, which is extra fork) if we know in advance
that  $full_hash =~ /^$hash/  can't be true, i.e. if $hash doesn't match
/^[0-9a-fA-F]+$/.  That would require that we continue to use $hash
and not $full_hash, see comment for the chunk below.

BTW do you think that having better name (nicer name in the case
when $hash is full SHA-1, or name which describes exact version as 
in the case when $hash is branch name or just 'HEAD') is worth
slight extra cost of "git rev-parse --abbrev=7"?

>  	my $name = $project;
>  	$name =~ s,([^/])/*\.git$,$1,;
>  	$name = basename($name);
> @@ -5213,7 +5245,7 @@ sub git_snapshot {
>  	$cmd = quote_command(
>  		git_cmd(), 'archive',
>  		"--format=$known_snapshot_formats{$format}{'format'}",
> -		"--prefix=$name/", $hash);
> +		"--prefix=$name/", $full_hash);

Why this change?

>  	if (exists $known_snapshot_formats{$format}{'compressor'}) {
>  		$cmd .= ' | ' . quote_command(@{$known_snapshot_formats{$format}{'compressor'}});
>  	}
> diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
> new file mode 100644
> index 0000000..5f2b1d5
> --- /dev/null
> +++ b/t/t9502-gitweb-standalone-parse-output.sh
> @@ -0,0 +1,67 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2009 Mark Rada
> +#
> +
> +test_description='gitweb as standalone script (parsing script output).
> +
> +This test runs gitweb (git web interface) as a CGI script from the
> +commandline, and checks that it produces the correct output, either
> +in the HTTP header or the actual script output.'

Currently all tests here are about 'snapshot' action.  They are quite
specific, and they do require some knowledge about chosen archive format.
I think it would be better to put snapshot test into separate test,
i.e. in 't/t9502-gitweb-standalone-snapshot.sh'.

> +
> +
> +. ./gitweb-lib.sh
> +
> +# ----------------------------------------------------------------------
> +# snapshot file name
> +
> +test_commit \
> +	'SnapshotFileTests' \
> +	'i can has snapshot?'

Errr... with filename [cutely] called 'i can has snapshot?' you would
have, I guess, problems with tests on MS Windows, where IIRC '?' is
forbidden in filenames.

Perhaps

  +test_commit \
  +	'Initial commit' \
  +	'foo'

or

  +test_commit \
  +	'SnapshotFileTests' \
  +	'foo' 'i can has snapshot?'


> +

In the test below you use "git rev-parse --verify HEAD" and
"git rev-parse --short HEAD" over and over.  I think it would be better
to calculate them upfront:

  +test_expect_success 'calculate full and short ids' '
  +	FULLID= $(git rev-parse --verify  HEAD) &&
  +	SHORTID=$(git rev-parse --short=7 HEAD)
  +'

> +test_expect_success 'snapshots: give full hash' '

You test here that giving full has works, and that gitweb uses short hash
in file name.  Better name would therefore be something like

  +test_expect_success 'snapshots: give full hash, get short hash' '


> +	ID=`git rev-parse --verify HEAD` &&
> +	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&

Here I had to remember that 'tgz' snapshot format is enabled by default.
I think it could be better to explicitly enable it in preparation step.

> +	ID=`git rev-parse --short HEAD` &&
> +	grep ".git-$ID.tar.gz" gitweb.output

Here had to think a bit that gitweb.output consists both of HTTP headers,
and of response body, and you are grepping here in the HTTP headers part.
It would be better solution for gitweb_run to split gitweb.output into
gitweb.headers and gitweb.body (perhaps if requested by setting some
variable, e.g. GITWEB_SPLIT_OUTPUT).

It can be done using the following lines:

	sed    -e '/^\r$/'      <gitweb.output >gitweb.headers
	sed -n -e '0,/^\r$/!p'  <gitweb.output >gitweb.body

	# gitweb.headers is used to parse http headers
	# gitweb.body is response without http headers

But the second one uses GNU sed extension; I don't know how to write
it in more portable way.

Then

> +	grep ".git-$ID.tar.gz" gitweb.output

would be

  +	grep ".git-$ID.tar.gz" gitweb.headers

Note that this would mean that t/t9501-gitweb-standalone-http-status.sh
should also be updated to use gitweb.headers and gitweb.body

> +'
> +test_debug 'cat gitweb.output'

Not a good idea in current state.  gitweb.output contains binary part,
and generally it is not a good idea to output binary files (which can
contain ANSI escape sequences) to terminal.

> +
> +test_expect_success 'snapshots: give short hash' '

  +test_expect_success 'snapshots: give short hash, get short hash' '

> +	ID=`git rev-parse --short HEAD` &&

Gitweb uses '--short=7'.  Shouldn't you use the same option here?

> +	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
> +	grep ".git-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give almost full hash' '

  +test_expect_success 'snapshots: give almost full hash, get short hash' '

> +	ID=`git rev-parse --short=30 HEAD` &&
> +	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
> +	ID=`git rev-parse --short HEAD` &&
> +	grep ".git-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give HEAD tree-ish' '

  +test_expect_success 'snapshots: give HEAD, get HEAD-<short hash>' '

> +	gitweb_run "p=.git;a=snapshot;h=HEAD;sf=tgz" &&
> +	ID=`git rev-parse --short HEAD` &&
> +	grep ".git-HEAD-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give branch name tree-ish' '

  +test_expect_success 'snapshots: give branch name, get <branch name>-<short hash>' '

> +	gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" &&
> +	ID=`git rev-parse --short master` &&
> +	grep ".git-master-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give tag tree-ish' '

  +test_expect_success 'snapshots: give tag name, get <tag name>-<short hash>' '

> +	gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" &&
> +	ID=`git rev-parse --short SnapshotFileTests` &&
> +	grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'

Note that to avoid ambiguities currently gitweb uses refs/heads/master
and refs/tags/SnapshotFileTests... but dealing with this issue should be
left, I think, for separate commit.

> +
> +
> +test_done
> -- 
> 1.6.4.GIT
> 
> 
> 

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]