On 05/22/2012 07:35 PM, Junio C Hamano wrote:
The current code reads the whole thing in upon first use of _any_ element in the file, just like the index codepath does for the index file. But the calling pattern to the refs machinery is fairly well isolated and all happens in refs.c file. Especially thanks to the recent work by Michael Haggerty, for "I am about to create a new branch 'frotz'; do I have 'refs/heads/frotz' or anything that begins with 'refs/heads/frotz/'?" kind of callers, it is reasonably easy to design a better structured packed-refs file format to allow us to read only a subtree portion of refs/ hierarchy, and plug that logic into the lazy ref population code. Such a "design a better packed-refs format for scalability to 400k refs" is a very well isolated project that has high chance of succeeding without breaking things. No code outside refs.c assumes that there is a flat array of refs that records what was read from the packed-refs file and can walk linearly over it, unlike the in-core index.
Even with the current file format, it would not be so difficult to bisect the file, synchronizing on record boundaries by looking for the next NL character. Because of the way the file is sorted, it would also be reasonably efficient to read whole subtrees in one slurp (e.g., for for_each_ref() with a prefix argument). Nontrivial modifications would of course not be possible without a rewrite.
There would need to be some intelligence built-in; after enough single-reference accesses come in a row, then the refs module should probably take it upon itself to read the whole packed-refs file to speed up further lookups.
If you do "for_each_ref()" for everything (e.g. sending 'have' during the object transfer, or repacking the whole repository), you would end up needing to read the whole thing for obvious reasons.
Yes. ISTM that any hope to avoid O(number of refs) problems when exchanging commits must involve using more intelligence about how references are related to each other topologically to improve the negotiation about what needs to be transferred.
Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html