Patrick Steinhardt <ps@xxxxxx> writes: > On Wed, Oct 16, 2024 at 10:48:28PM -0400, Jeff King wrote: >> On Wed, Oct 16, 2024 at 06:11:47PM -0400, Taylor Blau wrote: >> >> > On Wed, Oct 16, 2024 at 08:00:30AM +0200, Patrick Steinhardt wrote: >> > > But there is one exception here where we _can_ get away with sorting >> > > refs while streaming: ref backends sort references returned by their >> > > iterators in lexicographic order. So if the following conditions are all >> > > true we can do iterative streaming: >> > > >> > > - The caller uses at most a single name pattern. Otherwise we'd have >> > > to sort results from multiple invocations of the iterator. >> > > >> > > - There must be at most a single sorting specification, as otherwise >> > > we're not using plain lexicographic ordering. >> > > >> > > - The sorting specification must use the "refname". >> > > >> > > - The sorting specification must not be using any flags, like >> > > case-insensitive sorting. >> > >> > Perhaps a niche case, but what about ancient packed-refs files that were >> > written before the 'sorted' capability was introduced? >> >> We should be OK there. In that case we actually read in and sort the >> packed-refs entries ourselves. We have to, since we do an in-order merge >> with the loose refs while iterating. >> >> I do think this optimization is worth doing, and not a problem with our >> current backends. The biggest worries would be: >> >> 1. Some new ref backend that doesn't return sorted results. I find >> this unlikely, and anyway it's easily caught by having coverage in >> the test suite (which I assume we already have, but I didn't look). > > My assumption is that a ref iterator that _isn't_ sorted would lead to > undesirable behaviour. I'd be surprised if git-show-ref(1) started to > show refs in a random order. So we have essentially baked the > requirement that ref iterators return refs in lexicographic order into > our codebase already. > >> 2. Some new flag combination that requires disabling the optimization, >> and which must be dealt with in the code. This seems unlikely to me >> but not impossible. I think enabling the optimization is worth it, >> though. > > And this part is an issue with or without my patch. The logic we have > in the ref-filter API is quite fragile, and everybody who wants to add > some new flags must remember to update `can_do_iterative_format()`. I'm > not really a huge fan of that, but on the other hand the subsystem does > not change all that frequently anyway. > >> > > to sort results from multiple invocations of the iterator. >> >> I think this part is erring on the cautious side, as we can often >> collapse these into a single iteration due to the ref-prefix work. It >> may be OK to keep using the slower code here if multiple patterns aren't >> commonly used, but I'd suspect that: >> >> git for-each-ref --sort=refname refs/heads refs/tags >> >> could benefit. > > Mh. So we do end up using `refs_for_each_fullref_in_prefixes()`, which > may or may not end up collapsing the prefixes. We do sort and dedup the > prefixes via `find_longest_prefixes()`, so we don't have to worry about > e.g. `git for-each-ref refs/tags refs/heads refs/tags`. > Tangent: This sent me down a rabbit hole, I wonder if we can do better with naming, `find_longest_prefixes` calls `find_longest_prefixes_1`, The `_1` doesn't help at all with explaining what the function does. [snip]
Attachment:
signature.asc
Description: PGP signature