Re: [PATCH 8/9] sha1-file: use loose object cache for quick existence check

René Scharfe <l.s.r@xxxxxx> · Wed, 5 Dec 2018 07:02:17 +0100

Am 05.12.2018 um 05:46 schrieb Jeff King:
> On Tue, Dec 04, 2018 at 10:45:13PM +0100, René Scharfe wrote:
> 
>>> The comment in the context there is warning callers to remember to load
>>> the cache first. Now that we have individual caches, might it make sense
>>> to change the interface a bit, and make these members private. I.e.,
>>> something like:
>>>
>>>   struct oid_array *odb_loose_cache(struct object_directory *odb,
>>>                                     int subdir_nr) 
>>>   {
>>> 	if (!loose_objects_subdir_seen[subdir_nr])
>>> 		odb_load_loose_cache(odb, subdir_nr); /* or just inline it here */
>>>
>>> 	return &odb->loose_objects_cache[subdir_nr];
>>>   }
>>
>> Sure.  And it should take an object_id pointer instead of a subdir_nr --
>> less duplication, nicer interface (patch below).
> 
> I had considered that initially, but it's a little less flexible if a
> caller doesn't actually have an oid. Though both of the proposed callers
> do, so it's probably OK to worry about that if and when it ever comes up
> (the most plausible thing in my mind is doing some abbrev-like analysis
> without looking to abbreviate a _particular_ oid).

Right, let's focus on concrete requirements of current callers.  YAGNI..
:)

>> And quick_has_loose() should be converted to object_id as well -- adding
>> a function that takes a SHA-1 is a regression. :D
> 
> I actually wrote it that way initially, but doing the hashcpy() in the
> caller is a bit more awkward. My thought was to punt on that until the
> rest of the surrounding code starts handling oids.

The level of awkwardness is the same to me, but sha1_loose_object_info()
is much longer already, so it might feel worse to add it there.  This
function is easily converted to struct object_id, though, as its single
caller can pass one on -- this makes the copy unnecessary.

> This patch looks sane. How do you want to handle it? I'd assumed your
> earlier one would be for applying on top, but this one doesn't have a
> commit message. Did you want me to squash down the individual hunks?

I'm waiting for the first one (object-store: use one oid_array per
subdirectory for loose cache) to be accepted, as it aims to solve a
user-visible performance regression, i.e. that's where the meat is.
I'm particularly interested in performance numbers from Ævar for it.

I can send the last one properly later, and add patches for converting
quick_has_loose() to struct object_id.  Those are just cosmetic..

René