On Mon, 16 Apr 2012, Kacper Kornet wrote: > On Mon, Apr 16, 2012 at 10:06:49PM +0200, Jakub Narebski wrote: >> On Mon, 16 Apr 2012, Kacper Kornet wrote: >>> On Sat, Apr 14, 2012 at 03:16:01PM +0200, Jakub Narebski wrote: >>>> That saves I/O, but not fork. >> >> Actually if you look at the footer of projects list page with 'timed' >> feature enable you see that for N projects on list, gitweb uses 2*N+1 >> git commands. The "+1" part is from "git version", the "2*N" are from >> git-for-each-ref to get last activity (and verify repository as a >> side-effect)... > > It is how I started to think about the problem. With my additional patch > to remove the owner I am able to reduce the number of git invocations to > 1. That is a very good thing. Especially that adding caching to gitweb is long in coming (to core gitweb at least), and that not always one is able to turn on caching. [...] >>> My tests show that forks are also a bottleneck in my setup. >> >> What do you mean by "my tests" here? Is it benchmark (perhaps just using >> 'timed' feature) with and without custom change removing fork(s)? Or did >> you used profiler (e.g. wondefull Devel::NYTProf) for that? > > Nothing fancy. I look at the footnote produced by "timed" feature. And > I see a difference between version with the following patch: > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 18cdf96..4a13807 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3156,6 +3156,18 @@ sub git_get_project_owner { > return $owner; > } > > +sub git_repo_exist { Perhaps a better name would be validate_headref() or check_head(), called from subroutine named either is_git_directory() as in setup.c, or verify_repo()... > + my ($path) = @_; BTW this could be written as + my $path = shift; though it is largely a matter of taste. > + my $fd; > + > + $git_dir = "$projectroot/$path"; > + open($fd, "<", "$git_dir/HEAD") or return; It can be written as + open my $fd, "<", "$projectroot/$path/HEAD" + or return; > + my $line = <$fd>; > + close $fd or return; Shouldn't we chomp($line)? > + return 1 if (defined $fd && substr($line, 0, 10) eq 'ref:refs/' > + || $line=~m/^[0-9a-z]{40}$/); > + return 0; I don't think we need to check that $fd is defined; if it isn't, we would return earlier I think. There can be space between "ref:" and fully qualified branch name, and in fact git puts such space: $ cat .git/HEAD ref: refs/heads/gitweb/web Also, you can return boolean value. So the above would reduce to: + return ($line =~ m!^ref:\s*refs/! || + $line =~ m!^[0-9a-z]{40}$!); > +} > + > sub git_get_last_activity { > my ($path) = @_; > my $fd; > @@ -5330,6 +5342,7 @@ sub fill_project_list_info { > my $show_ctags = gitweb_check_feature('ctags'); > PROJECT: > foreach my $pr (@$projlist) { > + next PROJECT unless git_repo_exist($pr->{'path'}); I understand that it is proof of concept patch, but I think that in real patch iy would be better to update check_export_ok() instead of this addition. > if (project_info_needs_filling($pr, $filter_set->('age', 'age_string'))) { > my (@activity) = git_get_last_activity($pr->{'path'}); > unless (@activity) { > > > and the one in which git_repo_exist() uses invocation to /bin/true: > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 18cdf96..4bcc66f 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3156,6 +3156,13 @@ sub git_get_project_owner { > return $owner; > } > > +sub git_repo_exist { > + my ($path) = @_; > + > + $git_dir = "$projectroot/$path"; > + return not system('/bin/true'); > +} > + What were the differences in timing? >>> On the other >>> hand I think that I can trust that my projects.list contains only valid >>> repositories. So I would prefer to have a don't verify option. Or there >>> is a possibility to write perl function with the same functionality as >>> is_git_directory() from setup.c and use it to verify if the directory is a >>> valid git repo. >> >> Well, we can add those checks to check_export_ok()... well to function >> it would call. >> >> is_git_repository from setup.c checks that "/objects" and "/refs" >> have executable permissions, and that "/HEAD" is valid via validate_headref >> which does slightly more than (now slightly misnamed) check_head_link() >> from gitweb... >> >> ...or that DB_ENVIRONMENT i.e. GIT_OBJECT_DIRECTORY environment variable >> is set, and path that it points to has executable permissions. I don't >> think we have to worry about this for gitweb. > >> I'll try to come up with a patch to gitweb that improves repository >> verification, and perhaps a patch that uses the fact that "git config" >> succeeded to verify repository. > > As you see it is more or less what I have already written for my tests. > I only don't check if /objects and /refs are directories. If you want I > can send proper patch submission for this function I don't know how strict we should be, but "/objects" and "/refs" are just one stat more. Anyway, if you plan on resending this patch series, then "gitweb: Improve repository verification" should be be first, I think. -- Jakub Narebski Poland -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html