On Tue, Jan 19, 2021 at 06:09:42PM -0500, Jeff King wrote: > On Tue, Jan 19, 2021 at 01:19:17PM -0500, Taylor Blau wrote: > > > To attempt to reduce the difference between the number of refs > > traversed, and the number of refs sent, only traverse references which > > are in the longest common prefix of the given prefixes. This is very > > reminiscent of the approach taken in b31e2680c4 (ref-filter.c: find > > disjoint pattern prefixes, 2019-06-26) which does an analogous thing for > > multi-patterned 'git for-each-ref' invocations. > > > > The only difference here is that we are operating on ref prefixes, which > > do not necessarily point to a single reference. That is just fine, since > > all we care about is finding the longest common prefix among prefixes > > which can be thought of as refspecs for our purposes here. > > This second paragraph confused me. Aren't the inputs to for-each-ref > also prefixes? > > I guess they require an explicit '*', but fundamentally it's the same > concept (and certainly they are not just single references). Yeah, that is the point that I was trying to make. But re-reading this patch after knowing that it confused you, I think the clearest way to make that point is to drop that second paragraph entirely. > > Similarly, for_each_fullref_in_prefixes may return more results than the > > caller asked for (since the longest common prefix might match something > > that a longer prefix in the same set wouldn't match) but > > ls-refs.c:send_ref() discards such results. > > Based on my other poking, I'm not entirely sure that we can return too > many results. But I do think it's worth keeping the caller more careful. It can return more results, but I don't think that my writing in b31e2680c4 is particularly clear. Here's an example, though. Say I ask for `git for-each-refs 'refs/tags/a/*' 'refs/tags/a/b/c'`. The LCP of that is definitely "refs/tags/a", which might traverse other stuff like "refs/tags/a/b/d", which wouldn't get matched by either. > > The code introduced in b31e2680c4 is resilient to stop early (and > > return a shorter prefix) when it encounters a metacharacter (as > > mentioned in that patch, there is some opportunity to improve this, but > > nobody has done it). > > I agree that is how b31e2680c4 works, but we don't care about that > behavior here, since we have strict prefixes. Isn't the argument we need > to make the other way? I.e., that stopping early on a metacharacter will > not hurt us. Because at worst we return too many results (hey, there's a > case!) and because we would not expect metacharacters in the prefixes > anyway, as they are illegal in refnames. Yeah, thinking on it more I agree that's the argument we should be making here. I updated the patch to reflect it. > > There are two remaining small items in this patch: > > > > - If no prefixes were provided, then implicitly add the empty string > > (which will match all references). > > I wonder if for_each_fullref_in_prefixes() should do that, since that is > exactly what the other caller does, too. OTOH, maybe it is better to > make the callers be more explicit. In which case should it maybe BUG() > on an empty set of prefixes, rather than letting the caller assume some > behavior? Hmm. I don't feel strongly either way, but I think that the BUG is probably the most sensible option. > > - Since we are manually munging the prefixes, make sure that we > > initialize it ourselves (previously this wasn't necessary since the > > first strvec_push would do so). > > I think this was an existing bug (that we were just lucky it was > impossible to trigger, because nobody looked for the NULL sentinel), and > would make more sense as a separate fix. Right. I'll make sure to pull this one out into a separate patch and credit Jacob with authorship. > -Peff Thanks, Taylor