Re: "git notes show" is orders of magnitude slower than doing it manually with ls-tree and cat-file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 25, 2014 at 08:24:49PM -0500, Jeff King wrote:
> On Tue, Nov 25, 2014 at 08:00:51PM -0500, Jeff King wrote:
> 
> > On Wed, Nov 26, 2014 at 09:42:42AM +0900, Mike Hommey wrote:
> > 
> > > I have a note tree with a bit more than 200k notes.
> > >
> > > $ time git notes --ref foo show $sha1 > /dev/null
> > > real    0m0.147s
> > > user    0m0.136s
> > > sys     0m0.008s
> > > 
> > > That's a lot of time, especially when you have a script that does that
> > > on a fair amount of sha1s.
> > 
> > IIRC, the notes code populates an in-memory data structure, which gives
> > faster per-commit lookup at the cost of some setup time. Obviously for a
> > single lookup, that's going to be a bad tradeoff (but it does make sense
> > for "git log --notes"). I don't know offhand how difficult it would be
> > to tune the data structure differently (or avoid it altogether) if we
> > know ahead of time we are only going to do a small number of lookups.
> > But Johan (cc'd) might.
> 
> One other question: how were your notes created?
> 
> I tried to replicate your setup by creating one note per commit in
> linux.git (over 400k notes total). I did it with one big mktree,
> creating a single top-level notes tree. Doing a single "git notes show"
> lookup on the tree was something like 800ms.
> 
> However, this is not what trees created by git-notes look like. It
> shards the object sha1s into subtrees (1a/2b/{36}), and I think does so
> dynamically in a way that keeps each individual tree size low. The
> in-memory data structure then only "faults in" tree objects as they are
> needed. So a single lookup should only hit a small part of the total
> tree.
> 
> Doing a single "git notes edit HEAD" in my case caused the notes code to
> write the result using its sharding algorithm. Subsequent "git notes
> show" invocations were only 14ms.
> 
> Did you use something besides git-notes to create the tree? From your
> examples, it looks like you were accounting for the sharding during
> lookup, so maybe this is leading in the wrong direction (but if so, I
> could not reproduce your times at all even with a much larger case).

So... this is interesting. I happen to have recreated the notes tree
"manually", and now each git notes show takes under 10ms.

Now, looking at the notes tree reflog, I see that at some point, some
notes were added at the top-level of the tree, without being nested,
which is strange.

And it looks like it's related to how I've been adding them, through
git-fast-import. I was using notemodify commands, and was using the
filemodify command to load the previous notes tree instead of using the
from command because I don't care about keeping the notes history.
So fast-import was actually filling the notes tree as if it were
starting over with whatever new notes were added with notemodify (which,
in a case where there were many, it filled with one level of
indirection)

I'm not sure this is a case worth fixing in fast-import. I can easily
work around it.

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]