On Wed, Aug 03, 2011 at 03:33:39PM +0200, Michael Haggerty wrote: > On 07/15/2011 11:19 AM, Michael Haggerty wrote: > > On 07/14/2011 11:24 AM, Michael Haggerty wrote: > >> On 07/14/2011 09:16 AM, Michael Haggerty wrote: > >>> I have noticed that "git filter-branch" gets pathologically slow when it > >>> operates on a repository that has many references in a complicated > >>> directory hierarchy. The time seems to go like O(N^3), where N is the > >>> number of references being rewritten. > > [...] > > A many possible improvements come to mind, in increasing order of > > intrusiveness and generality: > > [...] > > 5. Organize the loose refs cache in memory as a tree, and only populate > > the parts of it that are accessed. This should also speed up iteration > > through a subtree by avoiding a linear search through all loose references. > > FYI: I am working on (5), namely storing a linked list of loose refs for > each directory and only populating those directories that are accessed. > The directories themselves will be held in a tree/trie (AFAICT the > distinction is primarily whether each node holds its whole key or only > the part of the key relative to its parent, which is an implementation > detail). As a bonus, the caches for submodules will be handled > correctly (they are currently never used). > > It might be another week or so before I have patches ready. Great. That is exactly the solution I was going to pursue, as well, but I didn't actually start on it yet. I look forward to seeing your patches. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html