Re: [PATCH 6/6] Retain caches of submodule refs

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Wed, 24 Aug 2011 10:17:24 +0200

On 08/13/2011 02:54 PM, Heiko Voigt wrote:
> On Sat, Aug 13, 2011 at 12:36:29AM +0200, Michael Haggerty wrote:
>> diff --git a/refs.c b/refs.c
>> index 8d1055d..f02cf94 100644
>> --- a/refs.c
>> +++ b/refs.c
> 
> ...
> 
>> @@ -205,23 +208,28 @@ struct cached_refs *create_cached_refs(const char *submodule)
>>   */
>>  static struct cached_refs *get_cached_refs(const char *submodule)
>>  {
>> -	if (! submodule) {
>> -		if (!cached_refs)
>> -			cached_refs = create_cached_refs(submodule);
>> -		return cached_refs;
>> -	} else {
>> -		if (!submodule_refs)
>> -			submodule_refs = create_cached_refs(submodule);
>> -		else
>> -			/* For now, don't reuse the refs cache for submodules. */
>> -			clear_cached_refs(submodule_refs);
>> -		return submodule_refs;
>> +	struct cached_refs *refs = cached_refs;
>> +	if (! submodule)
>> +		submodule = "";
> 
> Maybe instead of searching for the main refs store a pointer to them
> locally so you can immediately return here. That will keep the
> performance when requesting the main refs the same.

I am assuming that the number of submodules will be small compared to
the number of refs in a typical submodule.  Therefore, given that the
refs are stored in a big linked list and that the only thing that can
realistically be done with the list is to iterate through it, the cost
of finding the reference cache for a specific (sub-)module should be
negligible compared to the cost of the iteration over refs in the cache.
 Treating the main module like the other submodules makes the code
simpler, so I would prefer to leave it this way.

If iteration over refs ever becomes a bottleneck, then optimization of
the storage of refs within a (sub-)module would be a bigger win than
special-casing the main module.  And that is what I would like to work
towards.

> If I see it correctly you are always prepending to the linked list 

This is true.

>                                                                    and
> in case many submodules get cached this could slow down the iteration
> over the refs of the main repository.

Is this a realistic concern?  Remember that the extra search over
submodules only occurs once for each scan through the list of references.

I wrote an additional patch that moves the least-recently accessed
module to the front of the list.  But I doubt that the savings justify
the 10-odd extra lines of code, so I kept it to myself.  Please note
that this approach gives pessimal performance (2x slower) if the
submodules are iterated over repeatedly.  If you would like me to add
this patch to the patch series, please let me know.

Long-term, it would be better to implement a "struct submodule" and use
it to hold submodule-specific data like the ref cache.  Then users could
hold on to the "struct submodule *" and pass it, rather than the
submodule name, to the functions that need it.  But this goes beyond the
scope of what I want to change now, especially since I have no
experience even working with submodules.

Summary: I hope I have convinced you that the extra overhead is
negligible and does not justify additional code.  But if you insist on
more emphasis on performance (or obviously if you have numbers to back
up your concerns) then I would be willing to put more work into it.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html