On 08/13/2011 02:54 PM, Heiko Voigt wrote: > On Sat, Aug 13, 2011 at 12:36:29AM +0200, Michael Haggerty wrote: >> diff --git a/refs.c b/refs.c >> index 8d1055d..f02cf94 100644 >> --- a/refs.c >> +++ b/refs.c > > ... > >> @@ -205,23 +208,28 @@ struct cached_refs *create_cached_refs(const char *submodule) >> */ >> static struct cached_refs *get_cached_refs(const char *submodule) >> { >> - if (! submodule) { >> - if (!cached_refs) >> - cached_refs = create_cached_refs(submodule); >> - return cached_refs; >> - } else { >> - if (!submodule_refs) >> - submodule_refs = create_cached_refs(submodule); >> - else >> - /* For now, don't reuse the refs cache for submodules. */ >> - clear_cached_refs(submodule_refs); >> - return submodule_refs; >> + struct cached_refs *refs = cached_refs; >> + if (! submodule) >> + submodule = ""; > > Maybe instead of searching for the main refs store a pointer to them > locally so you can immediately return here. That will keep the > performance when requesting the main refs the same. I am assuming that the number of submodules will be small compared to the number of refs in a typical submodule. Therefore, given that the refs are stored in a big linked list and that the only thing that can realistically be done with the list is to iterate through it, the cost of finding the reference cache for a specific (sub-)module should be negligible compared to the cost of the iteration over refs in the cache. Treating the main module like the other submodules makes the code simpler, so I would prefer to leave it this way. If iteration over refs ever becomes a bottleneck, then optimization of the storage of refs within a (sub-)module would be a bigger win than special-casing the main module. And that is what I would like to work towards. > If I see it correctly you are always prepending to the linked list This is true. > and > in case many submodules get cached this could slow down the iteration > over the refs of the main repository. Is this a realistic concern? Remember that the extra search over submodules only occurs once for each scan through the list of references. I wrote an additional patch that moves the least-recently accessed module to the front of the list. But I doubt that the savings justify the 10-odd extra lines of code, so I kept it to myself. Please note that this approach gives pessimal performance (2x slower) if the submodules are iterated over repeatedly. If you would like me to add this patch to the patch series, please let me know. Long-term, it would be better to implement a "struct submodule" and use it to hold submodule-specific data like the ref cache. Then users could hold on to the "struct submodule *" and pass it, rather than the submodule name, to the functions that need it. But this goes beyond the scope of what I want to change now, especially since I have no experience even working with submodules. Summary: I hope I have convinced you that the extra overhead is negligible and does not justify additional code. But if you insist on more emphasis on performance (or obviously if you have numbers to back up your concerns) then I would be willing to put more work into it. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html