On Monday 07 December 2009, Shawn O. Pearce wrote: > Johan Herland <johan@xxxxxxxxxxx> wrote: > > +static uintmax_t do_change_note_fanout( > > + struct tree_entry *orig_root, struct tree_entry *root, > > + char *hex_sha1, unsigned int hex_sha1_len, > > + char *fullpath, unsigned int fullpath_len, > > + unsigned char fanout) > > I think this function winds up processing all notes twice. Yuck. > > tree_content_set() adds a new tree entry to the end of the current > tree. So when converting "1a9029b006484e8b9aca06ff261beb2324bb9916" > into "1a" (to go from fanout 0 to fanout 1) we'll place 1a at the > end of orig_root, and this function will walk 1a/ recursively, > examining 1a9029b006484e8b9aca06ff261beb2324bb9916 all over again. Yep, you're right. Still, we only do the tree_content_remove()/set() once per note, so although performance is probably not abysmal, we are still clearly suboptimal. Also, keep in mind that change_note_fanout() is only called when the number of notes crosses a power of 256. Thus for typical notes trees (which are assumed to mostly accumulate notes over their lifetime), change_note_fanout() will be called zero, one or two times (depending on the final number of notes). > If we're here, isn't it likely that *all* notes are in the wrong > path in the tree, and we need to move them all to a new location? > If that's true then should we instead just build an entirely new > tree and swap the root when we are done? Hmm. Not always. In your earlier scenario where we add 2,000,000 notes in a single commit, the current code would need to rewrite 255 of them from fanout 0 to fanout 2, and 65,535 of them from fanout 1 to fanout 2. But the vast majority (1,934,465) would not require rewriting (having been added at the correct fanout initially). However, if we build a new tree (by which I assume you mean tree_content_remove() from the old tree and tree_content_set() to the new tree for every single note (and non-note)), we end up processing all 2,000,000 entries. > As we empty out a tree the object will be recycled into a pool of > trees which can be reused at a later point. It might actually make > sense to build the new tree under a different root. We won't scan > entries we've moved, and the memory difference should be fairly > small as tree_content_remove() will make a subtree available for > reuse as soon as its empty. So we're only dealing with a handful > of additional tree objects as we do the conversion. I'm not sure I get the details here. How can we avoid doing the _remove()/_set() from/to the old/new tree for every tree_entry? In other words, how do we avoid removing and re-setting the 2,000,000 notes in the above example? Thanks for the review! ...Johan -- Johan Herland, <johan@xxxxxxxxxxx> www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html