Re: [PATCHv6 10/10] gitweb: group remote heads by remote

Jakub Narebski <jnareb@xxxxxxxxx> · Mon, 8 Nov 2010 14:41:42 +0100

On Mon, Nov 8, 2010, Giuseppe Bilotta napisał:
> On Mon, Nov 8, 2010 at 12:05 PM, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> On Mon, 8 Nov 2010, Giuseppe Bilotta wrote:
>>> On Thu, Nov 4, 2010 at 11:41 AM, Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>>>> On Wed, 3 Nov 2010, Giuseppe Bilotta wrote:
[...]
>>>> BTW. would next version of this series include patch to git-instaweb
>>>> enabling 'remote_heads' feature for it (gitweb_conf function)?
>>>
>>> I will look into that.
>>
>> It can be as simple as
>>
>> diff --git i/git-instaweb.sh w/git-instaweb.sh
>> index e6f6ecd..50f65b1 100755
>> --- i/git-instaweb.sh
>> +++ w/git-instaweb.sh
>> @@ -580,6 +580,8 @@ gitweb_conf() {
>>  our \$projectroot = "$(dirname "$fqgitdir")";
>>  our \$git_temp = "$fqgitdir/gitweb/tmp";
>>  our \$projects_list = \$projectroot;
>> +
>> +$feature{'remote_heads'}{'default'} = [1]
>>  EOF
>>  }
> 
> Thanks.

I forgot about trailing semicolon.  It should be:

    +$feature{'remote_heads'}{'default'} = [1];

>>> Either solution is fine, but it would require grabbing all the remote
>>> heads. The real issue here is, I think understanding what is the
>>> purpose of limiting in gitweb. Is it to reduce runtime? is it to
>>> reduce clutter on the screen? In the first case, the limiting should
>>> be done as early as possible (i.e. during the git call that retrieves
>>> the data); in the latter case, is it _really_ needed at all?
[...]

>> Regarding gitweb performance, it is quite important to pass limit to
>> git-log / git-rev-list needed also for 'summary' view; passing limit
>> to git command really matters here.
>>
>> git_get_heads_list passes '--count='.($limit+1) to git-for-each-ref,
>> but I don't think that it improves performance in any measurable way.
>> Similar with saving a memory: it is negligible amount.  So if we can
>> do better at the cost of running git_get_heads_list without a limit,
>> I say go for it.
> 
> The gain in performance is, I believe, related to the number of heads
> and the number of remotes that are to be enumerated. 11 remotes with a
> total of 58 remote branches (the case you mentioned, for example)
> might not feel much of a difference between pre- and post-filtering,
> but something bigger might start to feel the effect.

Actually I would guess it would depend on what git-for-each-ref does
it.  I would guess that git-for-each-ref reads in all refs anyway,
the limit only matters if format contains fieldnames that need accessing
the object,... like e.g. '%(subject)' which git_get_heads_list requests,
but git_heads_body doesn't use.  Ehh...

> I think the strongest point in favour of post-filtering is that the
> feature is intended for use mostly for local repositories anyway.

True.

>> Note that the costly part of git_get_heads_list is forking git command,
>> so it makes absolutely no sense to run git_get_heads_list once per
>> remote instead of doing limiting per-remote in Perl.  The former would
>> affect performance badly, I can guess.
> 
> That is indeed the reason why I chose to go the single call way, even
> though it meant having the limit end up being used somewhat
> incorrectly.

I think that single call and post-filtering would be reasonable 
compromise: reasonable performance (single fork), and correct results.

-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html