Re: [PATCH] promisor-remote.c: use oidset for deduplication

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 14 Jan 2022 15:12:37 -0800

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Also, if oid_array is used to produce a de-duplicated list of object
> names in the current code, it is very likely that oid_array is
> sorted (perhaps the objects are fed in sorted order), and the
> callers depend on the order of the objects they find in the array.
> Throwing sorted list of object names at oidset and then iterating
> over what is in the oidset would likely to destroy the original
> ordering.  I do not offhand know if the callers are broken by such a
> change (either correctness-wise or performance-wise).

Since I had a bit of downtime waiting for CI, I took a look.

It seems that the list of objects collected in the oidset/oid_array
is fed to "git fetch --stdin" for lazy fetching, so I would be
surprised if the order of objects matter.  So I would stop worrying
about correctness or performance due to ordering change.

>> But most importantly is it worth it? What's the rationale for the
>> change? Less CPU/memory use? Getting e.g. "hyperfine" or "/usr/bin/time
>> -v" output for those (if so) would be valuable.

But I still agree with this.  How much duplication would a typical
request for a lazy fetch involve?  Would it cause duplicate "want"
to make the protocol exchange noticeably larger?

Having said all that, the change makes the code smaller, and
possibly easier to follow.

The primary reason why promisor-remote.c::remove_fetched_oids()
shrinks so much is because we used to scan the array and copied
the surviving ones into a new array, while the new code just
iterates over the oidset and removes the ones that were fetched.
I am assuming that mucking with the oidset's contents while you
are iterating over it is safe, but if that is not the case, then
the advantage of the smaller code disappears.