On 07/15/2011 11:19 AM, Michael Haggerty wrote: > On 07/14/2011 11:24 AM, Michael Haggerty wrote: >> On 07/14/2011 09:16 AM, Michael Haggerty wrote: >>> I have noticed that "git filter-branch" gets pathologically slow when it >>> operates on a repository that has many references in a complicated >>> directory hierarchy. The time seems to go like O(N^3), where N is the >>> number of references being rewritten. > [...] > A many possible improvements come to mind, in increasing order of > intrusiveness and generality: > [...] > 5. Organize the loose refs cache in memory as a tree, and only populate > the parts of it that are accessed. This should also speed up iteration > through a subtree by avoiding a linear search through all loose references. FYI: I am working on (5), namely storing a linked list of loose refs for each directory and only populating those directories that are accessed. The directories themselves will be held in a tree/trie (AFAICT the distinction is primarily whether each node holds its whole key or only the part of the key relative to its parent, which is an implementation detail). As a bonus, the caches for submodules will be handled correctly (they are currently never used). It might be another week or so before I have patches ready. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html