Re: [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Monday 07 December 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@xxxxxxxxxxx> wrote:
> > +static uintmax_t do_change_note_fanout(
> > +		struct tree_entry *orig_root, struct tree_entry *root,
> > +		char *hex_sha1, unsigned int hex_sha1_len,
> > +		char *fullpath, unsigned int fullpath_len,
> > +		unsigned char fanout)
> 
> I think this function winds up processing all notes twice.  Yuck.
> 
> tree_content_set() adds a new tree entry to the end of the current
> tree.  So when converting "1a9029b006484e8b9aca06ff261beb2324bb9916"
> into "1a" (to go from fanout 0 to fanout 1) we'll place 1a at the
> end of orig_root, and this function will walk 1a/ recursively,
> examining 1a9029b006484e8b9aca06ff261beb2324bb9916 all over again.

Yep, you're right. Still, we only do the tree_content_remove()/set() once 
per note, so although performance is probably not abysmal, we are still 
clearly suboptimal.

Also, keep in mind that change_note_fanout() is only called when the number 
of notes crosses a power of 256. Thus for typical notes trees (which are 
assumed to mostly accumulate notes over their lifetime), 
change_note_fanout() will be called zero, one or two times (depending on the 
final number of notes).

> If we're here, isn't it likely that *all* notes are in the wrong
> path in the tree, and we need to move them all to a new location?
> If that's true then should we instead just build an entirely new
> tree and swap the root when we are done?

Hmm. Not always. In your earlier scenario where we add 2,000,000 notes in a 
single commit, the current code would need to rewrite 255 of them from 
fanout 0 to fanout 2, and 65,535 of them from fanout 1 to fanout 2. But the 
vast majority (1,934,465) would not require rewriting (having been added at 
the correct fanout initially). However, if we build a new tree (by which I 
assume you mean tree_content_remove() from the old tree and 
tree_content_set() to the new tree for every single note (and non-note)), we 
end up processing all 2,000,000 entries.

> As we empty out a tree the object will be recycled into a pool of
> trees which can be reused at a later point.  It might actually make
> sense to build the new tree under a different root.  We won't scan
> entries we've moved, and the memory difference should be fairly
> small as tree_content_remove() will make a subtree available for
> reuse as soon as its empty.  So we're only dealing with a handful
> of additional tree objects as we do the conversion.

I'm not sure I get the details here. How can we avoid doing the 
_remove()/_set() from/to the old/new tree for every tree_entry? In other 
words, how do we avoid removing and re-setting the 2,000,000 notes in the 
above example?


Thanks for the review!

...Johan

-- 
Johan Herland, <johan@xxxxxxxxxxx>
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]