Re: Another bench on gitweb (also on gitweb caching)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bruno Cesar Ribas <ribas@xxxxxxxxxxxx> writes:

> I made another SIMPLE bench on gitweb. Testing time on git-for-each-ref.
> 
> Using my 1000 projects I ran:
> 8<----------------
> #/bin/bash
> PEGAR_ref() { 
>     PROJ=projeto$1.git; 
>     cd $PROJ; 
>     printf "\tlastref = $(git-for-each-ref --sort=-committerdate --count=1\
>             --format='%(committer)')\n" >> config; 
>     cd -; 
> }
> cd $HOME/scm
> for((i=1;i<=1000;i++)){ PEGAR_ref $i & }
> 8<----------------

Could you please do not mix English and your native language
(Portuguese?) in shown examples? Mixing two languages in one
identifier name (unless it is ref in br too) is especially bad
form... TIA.

Besides, what I'm more interested in is a script used to generate
those 1000 projects...
 
> And at the "git_get_last_activity" instead of running git-for-each-ref i
> asked to get gitweb.lastref
> 
> Here are the results:
> "dd" means: dd if=/dev/zero of=$HOME/dd/$i bs=1M count=400000
> 
> Running 2 dd to generate disk IO.  Here comes the results:
> NO projects_list  projects_list
> 7m56s55           6m11s95        cached last change, using gitweb.lastref
> 16m30s69          15m10s74       default gitweb, using FS's owner
> 16m07s40          15m24s34       patched to get gitweb.owner
> 
> Now results for a 1000projects on an idle machine. (No dd running to
> generate IO)
> NO projects_list  projects_list
> 0m26s79           0m38s70       cached last change, using gitweb.lastref
> 1m19s08           1m09s55       default gitweb, using FS's owner
> 1m17s58           1m09s55       patched to get gitweb.owner

Those are results of running gitweb as standalone script, or your
script runing git-for-each-ref?

Besides, I'd rather see results of running ApacheBench. On Linux it
usually comes with installed Apache, and it is called by runing
'ab'. Your tests instead of adding superficial load could try to use
concurrent requests, and more than 1 request to get better average.
 
> I found out those VERY interesting, so instead of trying to think a
> new way to store gitweb config, we should think a way to cache those
> information.

Below there are my thoughts about caching information for gitweb:

First, the basis of each otimisation is checking the bottlenecks.
I think it was posted sometime there that the pages taking most load
are projects list and feeds. 

Kernel.org even run modified version of gitweb, with some caching
support; Cgit (git web interface in C) also has caching support.


Due to the fact that gitweb produces relative time in output for
projects list page and for project summary page, it is unfortunately
not easy to just simply cache HTML output: one would have either
resign from using relative time, or rewrite time from relative to
absolute, either on server (in gitweb), or on client (in JavaScript).
So perhaps it would be better to cache generating (costly to obtain)
information; like lastchanged time for projects.

Or we can for example assume (i.e. do that if appropriate gitweb
feature is set) that projects are bare projects pushed to, and that
git-update-server-info is ran on repository update (for example for
HTTP protocol transport), and stat $GIT_DIR/info/refs and/or
$GIT_DIR/objects/info/packs instead of running git-for-each-ref.
Of course then column would be called something like "Last Update"
instead of "Last Change".

The "Last Update" information is especially easy because it can be
invalidated / update externally, by the update / post-receive hook,
outside gitweb. So gitweb doesn't need to implement some caching
invalidation mechanism for this.

We can store lastref / lastchange information in repository config, as
for example "gitweb.lastref" key. We can store it in gitweb wide
config, for example in $projectroot/gitwebconfig file, as for example
"gitweb.<project>.lastref" key. Or we can store it as hash initializer
in some sourced Perl file, read from gitweb_config.perl (this I think
can be done even now without touching gitweb code at all); we can use
Data::Dumper to save such information.

The possibilities are many.

-- 
Jakub Narebski
Poland
ShadeHawk on #git
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux