Re: [PATCH] ref-filter: format iteratively with lexicographic refname sorting

karthik nayak <karthik.188@xxxxxxxxx> · Mon, 21 Oct 2024 07:36:55 -0500

Patrick Steinhardt <ps@xxxxxx> writes:

> On Wed, Oct 16, 2024 at 10:48:28PM -0400, Jeff King wrote:
>> On Wed, Oct 16, 2024 at 06:11:47PM -0400, Taylor Blau wrote:
>>
>> > On Wed, Oct 16, 2024 at 08:00:30AM +0200, Patrick Steinhardt wrote:
>> > > But there is one exception here where we _can_ get away with sorting
>> > > refs while streaming: ref backends sort references returned by their
>> > > iterators in lexicographic order. So if the following conditions are all
>> > > true we can do iterative streaming:
>> > >
>> > >   - The caller uses at most a single name pattern. Otherwise we'd have
>> > >     to sort results from multiple invocations of the iterator.
>> > >
>> > >   - There must be at most a single sorting specification, as otherwise
>> > >     we're not using plain lexicographic ordering.
>> > >
>> > >   - The sorting specification must use the "refname".
>> > >
>> > >   - The sorting specification must not be using any flags, like
>> > >     case-insensitive sorting.
>> >
>> > Perhaps a niche case, but what about ancient packed-refs files that were
>> > written before the 'sorted' capability was introduced?
>>
>> We should be OK there. In that case we actually read in and sort the
>> packed-refs entries ourselves. We have to, since we do an in-order merge
>> with the loose refs while iterating.
>>
>> I do think this optimization is worth doing, and not a problem with our
>> current backends. The biggest worries would be:
>>
>>   1. Some new ref backend that doesn't return sorted results. I find
>>      this unlikely, and anyway it's easily caught by having coverage in
>>      the test suite (which I assume we already have, but I didn't look).
>
> My assumption is that a ref iterator that _isn't_ sorted would lead to
> undesirable behaviour. I'd be surprised if git-show-ref(1) started to
> show refs in a random order. So we have essentially baked the
> requirement that ref iterators return refs in lexicographic order into
> our codebase already.
>
>>   2. Some new flag combination that requires disabling the optimization,
>>      and which must be dealt with in the code. This seems unlikely to me
>>      but not impossible. I think enabling the optimization is worth it,
>>      though.
>
> And this part is an issue with or without my patch. The logic we have
> in the ref-filter API is quite fragile, and everybody who wants to add
> some new flags must remember to update `can_do_iterative_format()`. I'm
> not really a huge fan of that, but on the other hand the subsystem does
> not change all that frequently anyway.
>
>> > >     to sort results from multiple invocations of the iterator.
>>
>> I think this part is erring on the cautious side, as we can often
>> collapse these into a single iteration due to the ref-prefix work. It
>> may be OK to keep using the slower code here if multiple patterns aren't
>> commonly used, but I'd suspect that:
>>
>>   git for-each-ref --sort=refname refs/heads refs/tags
>>
>> could benefit.
>
> Mh. So we do end up using `refs_for_each_fullref_in_prefixes()`, which
> may or may not end up collapsing the prefixes. We do sort and dedup the
> prefixes via `find_longest_prefixes()`, so we don't have to worry about
> e.g. `git for-each-ref refs/tags refs/heads refs/tags`.
>

Tangent: This sent me down a rabbit hole, I wonder if we can do better
with naming, `find_longest_prefixes` calls `find_longest_prefixes_1`,
The `_1` doesn't help at all with explaining what the function does.

[snip]
Attachment:
signature.asc

Description: PGP signature