Re: .git/info/refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



H. Peter Anvin <hpa@xxxxxxxxx> wrote:
> Jakub Narebski wrote:
>> 
>> I don't think it can be easily expanded. .git/info/refs is meant for
>> http-fetch, and it mimics git-ls-remote / git-peek-remote output.
> 
> For heaven's sake, in computer science we can *NEVER* use the same 
> feature for *MORE THAN ONE THING*.  If it doesn't work format-wise 
> that's fine, but "it's only supposed to be used by dumb transports" is 
> ridiculous.

.git/info/refs is for dumb transports, so if we follow "do not use
the same feature for more than one thing" principle we should not
change its format for gitweb.

.git/info/refs is one of auxiliary info files to help dumb servers,
(servers that does not do on-the-fly pack generation), to help
clients discover what references server has.  The second auxiliary
info file is .git/objects/info/packs.  Both are generated by
git-update-server-info command, usually run from post-update hook.

Because .git/info/refs format is the same as git-ls-remote output
(AFAIK smart servers use git-ls-remote or git-peek-remote; dumb
servers use .git/info/refs) we used and can use it as ''cached''
"git ls-remote ." / "git peek-remote ." / "git show-ref --dereference"
output. For bare repositories where new data arrives only via
'update' (via push or fetch) and always trigger post-update hook,
and not for example via git-commit which does not invoke post-update
hook, the information in .git/info/refs is always fresh.

What I propose as quick solution is to add new (perhaps local)
git-update-gitweb-info command which is to be used in post-update
(and perhaps post-commit for non-bare repos) hook, and which results
we would use in gitweb.  See patch at the bottom.

>> BTW. putting the info of git-for-each-ref into .git/info/refs-details
>> would mean that instead of "24175 calls to git" one would need to
>> read 24175 files. Perhaps the whole info needed to generate projects
>> index page should be pre-generated on push (update), instead of per
>> project (per repository) .git/info/refs-details
> 
> No, it should be one file per repository, not one file per ref.  Why? 
> Obviously we don't want 24175 files to be accessed.  However, a push can 
> only affect files for which the repository owner has permission and 
> which resides in the repository filespace, so it should stay inside that 
> space.

Gitweb _newer_ did one call to git _per ref_, but always one call to git
_per repository_!  Old git always used HEAD ref to get "Last Change" info
and used one call to git-rev-list (if I remember correctly), new git
checks all refs to get "Last Change" info but uses _one_ call to 
git-for-each-ref.  Because we did not want to affect gitweb performance
badly we waited for changing "Last Change" to check all refs and not
only HEAD to have git-for-each-ref to use one call to git command for that.
Historically it was first use of git-for-each-ref in gitweb.

Sidenote: I planned to add new %feature to gitweb to allow to chose
if to use all refs for "Last Change" info, HEAD ref, or some given ref
(for example "master").  But that would perhaps wait for .git/config
parser in Perl.

> On kernel.org, this would reduce the load from 24175 calls to git to 
> reading 250 files.  Although the latter is still expensive (and will 
> probably need post-generation caching) the files should be small and 
> cacheable by the kernel, and the resulting I/O load should be quite small.

Oh, so there are around 250 projects, and around 24175 references
together in those projects on kernel.org?  I thought it were 24175
_projects_ (repositories)...

Currently, it is 250 calls to git, reading 24175 files (unless refs
are packed, then it would be reading 250 files) to get refs (heads)
info, and reading around 2*250 files (packs + index) to get last
change info.  Not "24175 calls to git".

> Anyway, as far as git-update-server-info is concerned, I'm *very* 
> concerned that there be a single command that updates all the cached 
> information across the repository.  Telling everyone to update their 
> hooks every time we want to add cached information is silly.  Right now, 
> git-update-server-info is the command to update cached information, and 
> for usability reasons there should be a single entry point.

git-update-server-info is to "update auxiliary info file to help dumb
servers". I propose to use (new) git-update-gitweb-info to help gitweb.
One command for one feature.  This would mean unfortunately adding
"exec git-update-gitweb-info" line (if it does not exist) to existing
projects post-update hooks; for new projects it would be I think enough
to modify post-update template (templates/hooks--post-update or
/usr/share/git-core/templates/hooks/post-update).


Below the patches of how it can be done.  Does not include corrections
to Makefile to install git-update-gitweb-info.  NOT TESTED!

BTW final version of git-update-gitweb-info probably should be a built-in
command, like git-update-server-info, not a script.


diff --git a/git-update-gitweb-info.sh b/git-update-gitweb-info.sh
new file mode 100755
index 0000000..5bb44df
--- /dev/null
+++ b/git-update-gitweb-info.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+
+. git-sh-setup
+test -w "$GIT_DIR/info/last-changed" &&
+git-for-each-ref \
+	--format='%(committer)' --sort=-committerdate --count=1 refs/heads \
+	> "$GIT_DIR/info/last-changed"
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 88af2e6..e7874a6 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1150,12 +1150,16 @@ sub git_get_last_activity {
 	my ($path) = @_;
 	my $fd;
 
-	$git_dir = "$projectroot/$path";
-	open($fd, "-|", git_cmd(), 'for-each-ref',
-	     '--format=%(committer)',
-	     '--sort=-committerdate',
-	     '--count=1',
-	     'refs/heads') or return;
+	if (-r "$projectroot/$path/info/last-changed") {
+		open $fd, "$projectroot/$path/info/last-changed";
+	} else {
+		$git_dir = "$projectroot/$path";
+		open($fd, "-|", git_cmd(), 'for-each-ref',
+		     '--format=%(committer)',
+		     '--sort=-committerdate',
+		     '--count=1',
+		     'refs/heads') or return;
+	}
 	my $most_recent = <$fd>;
 	close $fd or return;
 	if ($most_recent =~ / (\d+) [-+][01]\d\d\d$/) {
diff --git a/templates/hooks--post-update b/templates/hooks--post-update
old mode 100644
new mode 100755
index bcba893..b119224
--- a/templates/hooks--post-update
+++ b/templates/hooks--post-update
@@ -6,3 +6,4 @@
 # To enable this hook, make this file executable by "chmod +x post-update".
 
 exec git-update-server-info
+exec git-update-gitweb-info

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]