Re: [PATCH 4/3] midx: inline nth_midxed_pack_entry()

René Scharfe <l.s.r@xxxxxx> · Sun, 12 Sep 2021 01:39:52 +0200

Am 11.09.21 um 23:20 schrieb Jeff King:
> On Sat, Sep 11, 2021 at 10:31:34PM +0200, René Scharfe wrote:
>
>> Am 11.09.21 um 19:07 schrieb Jeff King:
>>> On Sat, Sep 11, 2021 at 06:08:42PM +0200, René Scharfe wrote:
>>>
>>>> @@ -304,8 +307,7 @@ static int nth_midxed_pack_entry(struct repository *r,
>>>>  	if (!is_pack_valid(p))
>>>>  		return 0;
>>>>
>>>> -	nth_midxed_object_oid(&oid, m, pos);
>>>> -	if (oidset_contains(&p->bad_objects, &oid))
>>>> +	if (oidset_contains(&p->bad_objects, oid))
>>>>  		return 0;
>>>
>>> So we get to avoid the nth_midxed_object_oid() copy entirely. Very nice.
>>>
>>> Compared to the code before your series, we still have an extra function
>>> call to oidset_contains(), which will (in the common case) notice we
>>> have no entries and immediately return. But I think that's getting into
>>> pointless micro-optimization.
>>
>> Right.  I measure a 0.5% slowdown for git multi-pack-index verify.  An
>> inline oidset_size call avoids it.  That's easy enough to add, so let's
>> have it!
>
> I don't mind that, but I wonder if we can have our cake and eat it, too.
>
> oidset_contains() is short, too, and could be inlined. Or if we're
> worried about the size of the embedded kh_get_oid_set() getting inlined,
> we could do something like:
>
>   static inline int oidset_contains(const struct oidset *set, const
> 				    struct object_id *oid)
>   {
> 	if (!oidset_size(set))
> 		return 0;
> 	return oidset_contains_func(set, oid);
>   }
>
> That saves callers from having to deal with it, at the expense of a
> slightly complicated oidset implementation.
>
> I guess it's an extra integer comparison for callers that _do_ expect to
> have a non-empty set. So maybe it is better left to the caller to
> decide whether to optimize in this way.
>
> (A totally inline oidset_contains() avoids the extra check, but possibly
> at the cost of larger code size).

I wondered the same.

Inlining oidset_contains() would follow the spirit of khash.  It adds
16KB to my build (ca. 688 bytes per caller).  Hmm.

I expected the hybrid approach with an inlined emptiness check and a
shared actual contains function to be as fast as the original code, due
to caching.  I actually saw the 0.1% slowdown of git multi-pack-index
verify when I added a fake bad object at the end of prepare_midx_pack()
to simulate a non-empty oidset.  Hmm!

Both are probably defensible, but for this series I took the more
targeted approach to limit the impact.

René