Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joey Hess wrote:
Giuseppe Bilotta wrote:
There is a small overhead in including the microformat on project list
and forks list pages, but getting the project descriptions for those pages
already incurs a similar overhead, and the ability to get every repo url
in one place seems worthwhile.
I agree with this, although people with very large project lists may
differ ... do we have timings on these?

AFAICS, when displaying the project list, gitweb reads each project's
description file, falling back to reading its config file if there is no
description file.

If performance was a problem here, the thing to do would be to add
project descriptions to the $project_list file, and use those in
preference to the description files. If a large site has done that,
they've not sent in the patch. :-)

No because all the large sites have pain points and issues elsewhere in the app. Most of the large sites (which I can at least speak for Kernel.org) went and have built in full caching layers into gitweb itself to deal with the problem. This means that we don't have to worry about nickle and dime performance improvements that are specific to one section, but can do a very broad sweep and get dramatically better performance across all of gitweb. Those patches have all made it back out onto the mailing list, but for a number of different reasons none have been accepted into the mainline branch.

With my patch, it will read each cloneurl file too. The best way to
optimise that for large sites seems to be to add an option that would
ignore the cloneurl files and config file and always use
@git_base_url_list.

I checked the only large site I have access to (git.debian.org) and they
use a $project_list file, but I see no other performance tuning. That's
a 2 ghz machine; it takes gitweb 28 (!) seconds to generate the nearly 1
MB index web page for 1671 repositories:

Look at either Lea's or my caching engines, it will help dramatically on something of that size.

/srv/git.debian.org/http/cgi-bin/gitweb.cgi  3.04s user 9.24s system 43% cpu 28.515 total

Notice that most of the time is spent by child processes. For each
repository, gitweb runs git-for-each-ref to determine the time of the
last commit.

If that is removed (say if there were a way to get the info w/o
forking), performance improves nicely:

./gitweb.cgi > /dev/null  1.29s user 1.08s system 69% cpu 3.389 total

Making it not read description files for each project, as I suggest above,
is the next best optimisation:

./gitweb.cgi > /dev/null  1.08s user 0.05s system 96% cpu 1.170 total

So, I think it makes sense to optimise gitweb and offer knobs for performance
tuning at the expense of the flexability of description and cloneurl files.
But, git-for-each-ref is swamping everything else
The problem is the knobs are going to be very fine grained, you really are better off looking at one of the caching engines that's available now. Performance options are hard, because it's difficult to relay to anyone the complex tradeoffs, thus keeping knobs like that to a minimum are really a necessity.

- John 'Warthog9' Hawley
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux