Re: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout

Jeff King <peff@xxxxxxxx> · Fri, 7 Apr 2017 01:19:35 -0400

On Thu, Apr 06, 2017 at 03:48:07PM -0700, Stefan Beller wrote:

> On Thu, Apr 6, 2017 at 1:37 PM,  <git@xxxxxxxxxxxxxxxxx> wrote:
> > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> >
> > Teach traverse_trees_recursive() to not do redundant ODB
> > lookups when both directories refer to the same OID.
> 
> And the reason for this is that omitting the second lookup
> saves time, i.e. a lookup in the ODB of a sufficiently large
> repo is slow.
> 
> My kneejerk line of thinking:
> * yes, it sounds good to reduce the number of ODB accesses.
> * But if we consider ODB lookups to be slow and we perform
>   a structured access, how about a cache in front of the ODB?
> * We already have that! (sort of..) 9a414486d9 (lookup_object:
>   prioritize recently found objects, 2013-05-01)
> * Instead of improving the caching, maybe change the
>   size of the problem: We could keep the objects of different types
>   in different hash-tables.

I don't think this is using lookup_object() at all, though. It goes
straight to fill_tree_descriptor(), which will read the object contents
from disk. So our time is going to:

  1. Finding the object in the odb (ideally a binary search in a single
     pack index, but less optimal when there are many packs).

  2. Reconstructing the object. This means zlib-inflating from disk, but
     it may also mean delta reconstruction.

I think there _are_ some caches at play here, though, when you look up
the same tree back to back. The pack-mru means that we'll always look in
the correct pack first. And in theory the delta base cache means that
we'll already have the whole thing reconstructed in memory (though I
have often been confused by exactly when we put items into that cache,
so I might be wrong).

So in theory, this is not all that different than the "just allocate and
copy the bytes" optimization that's happening here (though I'm not
surprised that doing it at a higher level can produce some speedup).

I think the more interesting optimization is "just use the same buffer
without bothering to copy", which is hard for the low-level code to do
(since it doesn't know what lifetime the caller is expecting). 

> object.c has its own hash table, I presume for historical and
> performance reasons, this would be split up to multiple hash
> tables.

So I don't think lookup_object() is really relevant here. But I'm also
not sure that multiple hash tables would really buy us much. In theory
hash tables are O(1), so multiple smaller tables doesn't help (and might
hurt, since now we have four O(1) lookups to do). Of course that's a
minor fiction because of collisions, but we keep the load factor on the
object.c table pretty low (and that's why things like quadratic probing
and cuckoo hashing never showed great improvement).

-Peff