Re: [RFC] Possible optimization for gitweb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Please send replies Cc: git mailing list]

Robert Fitzsimons wrote:

> While looking at the gitweb source yesterday, I noticed a number of
> similar expensive workflows used by a number of actions (summary,
> shortlog, log, rss, atom, and history).
> 
> The current workflows are:
>       get ~100 sha1's using rev-list
>       foreach sha1
>               get/parse 1 commit using rev-list
>               output commit
> 
> The new workflows I'm proposing would be:
>       get/parse ~100 commit's using rev-list
>       foreach commit
>               output commit

I have tried this approach too. Take a look at

  http://repo.or.cz/w/git/jnareb-git.git?a=log;h=Attic/gitweb/parse_rev_list

or at discussion started with
  Message-Id: <200609061504.40725.jnareb@xxxxxxxxx>
  http://mid.gmane.org/200609061504.40725.jnareb@xxxxxxxxx

> The following simplified commands gives an idea of the git only overhead
> between these two workflows.
> 
> time \
> for r in `git-rev-list --max-count=100 HEAD --` ; \
> do git-rev-list --header --parents --max-count=1 $r -- ; \
> done > /dev/null
> 
> real    0m0.490s
> user    0m0.224s
> sys     0m0.228s
> 
> time \
> git-rev-list --header --parents --max-count=100 HEAD -- > /dev/null
> 
> real    0m0.058s
> user    0m0.008s
> sys     0m0.004s
> 
> There would seems to be a benefit from making the proposed change to
> these workflows, when run on my machine against a clone of Linus's tree.

The problem is that it works only for "log" and "shortlog" views, but
it doesn't work for "history" view. Now both share the same infrastructure.
The problem is that when there is path limiter (be it file or directory)
the history is simplified, and parents are _rewritten_ according to
simplified history. And this happen depending on strange combination
of --header, --parents and --full-history. Should be somewhere in archives.

And we don't want to use parents from commit object, because there might
be grafts, or it might be shallow clone.

On the other hand, we don't really need parents for log, shortlog and
history...

> One issue with this change is that, gitweb is page orientated.  Page 0
> shows the first 100 items from a given hash, page 1 uses the same given
> hash but show 100 to 199 items, etc.  Using 'git-rev-list --header
> --parents' and then throwing away most of the result is very wasteful.
> 
> So I'm suggesting we add a new option to git-rev-list which will only
> start show results once its has iterated past a given number of items.
> Using a caret or tilde doesn't seem to return the same result.
> 
> I've attached a discussion patch which adds a new option --start-count
> to git-rev-list and changed the summary and showlog actions of gitweb to
> use this new option.

Very nice idea.
 
> I'm sure there are many improvements to this patch, comments?

Perhaps this patch should be split in two? (Usually either second mail is
reply to first mail, or both are replies to introductory letter, usually
with table of contents and diffstat of series).

[...]

Documentation (of --start-count / --skip option), please?


P.S. Thanks for the patches.

P.P.S. Do you have any comments to latest "[RFC] gitweb wishlist and TODO
list" series?
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]