Hi Jacob, On Tue, Jan 19, 2021 at 03:42:51PM +0100, Jacob Vosmaer wrote: > The previous implementation of ls-refs would perform exactly one ref > walk, matching each ref against the prefixes (if any) provided by the > user. This can be expensive if there are a lot of refs and the user > only cares about a small subset of them. > > In this patch we analyze the prefixes provided by the user and build a > minimal set of disjoint prefixes that contains all of them. We then do > a ref walk for each of these minimal prefixes. This reminds me of b31e2680c4 (ref-filter.c: find disjoint pattern prefixes, 2019-06-26), where we solved a very similar problem for 'git for-each-ref'. The difference here is that we are operating on a set of prefixes, not a set of refs. But, I think that we could get pretty far by treating the prefixes as refs so that we can call ref-filter.c:find_longest_prefixes(). For its purposes, it doesn't really care about whether or not the arguments actually are references. It simply returns the longest common prefix among all of its arguments (delimited by '/' characters). > It is tempting to have just one strvec for the prefixes and use it > both for matching and for iterating. But every time I tried that, it > made things more complicated. I settled on leaving the existing ref > matching (using &data.prefixes) alone, and I added a new layer around > it for the ref walk optimization (using &iter_prefixes). I think the implementation in b31e2680c4 deals with this nicely: it takes a pointer to a strvec and dumps prefixes in there. > This commit also fixes a bug in ls-refs.c that was not triggered > before: we were using a strvec set to zero, which is not how you are > supposed to initialize a strvec. We now call strvec_init after zeroing. Good. > Signed-off-by: Jacob Vosmaer <jacob@xxxxxxxxxx> > --- > ls-refs.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 62 insertions(+), 1 deletion(-) > > diff --git a/ls-refs.c b/ls-refs.c > index a1e0b473e4..6d5f0c769a 100644 > --- a/ls-refs.c > +++ b/ls-refs.c > @@ -84,12 +84,44 @@ static int ls_refs_config(const char *var, const char *value, void *data) > return parse_hide_refs_config(var, value, "uploadpack"); > } > > +static int cmp_prefix(const void *a_, const void *b_){ > + const char *a = *(const char **)a_; > + const char *b = *(const char **)b_; > + return strcmp(a, b); > +} > + > +static void deduplicate_prefixes(struct strvec *prefixes) { > + int i; > + > + QSORT(prefixes->v, prefixes->nr, cmp_prefix); > + > + for (i = 1; i < prefixes->nr;) { > + const char *p = prefixes->v[i]; > + > + /* > + * If p is "refs/foobar" and its predecessor is "refs/foo" then we should > + * drop p, both to avoid sending duplicate refs to the user, and to avoid > + * doing unnecessary work. > + */ > + if (starts_with(p, prefixes->v[i - 1])) { > + MOVE_ARRAY(&prefixes->v[i], &prefixes->v[i + 1], prefixes->nr - (i + 1)); > + prefixes->v[prefixes->nr - 1] = p; > + strvec_pop(prefixes); > + } else { > + i++; > + } > + } > +} > + Indeed, this and the below code are very reminiscent of b31e2680c4. So, I wonder if it's possible to use the existing implementation rather than implement what is roughly the same thing twice. Below is a completely untested patch to try and reuse the code from b31e2680c4. (It compiles, but that's the extent of my guarantees about it ;-).) It's all smashed into one huge patch, so if you're happy with the direction I'll take care of cleaning it up. The new function in ref-filter.h really belongs in refs.h, but I left the implementation in ref-filter.c to avoid creating more noise in the diff. Let me know what you think. Thanks, Taylor --- >8 --- Subject: [PATCH] ls-refs: iterate longest common refs prefix Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> --- ls-refs.c | 4 +++- ref-filter.c | 46 ++++++++++++++++++++++++++++++++-------------- ref-filter.h | 10 ++++++++++ 3 files changed, 45 insertions(+), 15 deletions(-) diff --git a/ls-refs.c b/ls-refs.c index a1e0b473e4..6a3e11d45c 100644 --- a/ls-refs.c +++ b/ls-refs.c @@ -6,6 +6,7 @@ #include "ls-refs.h" #include "pkt-line.h" #include "config.h" +#include "ref-filter.h" /* * Check if one of the prefixes is a prefix of the ref. @@ -109,7 +110,8 @@ int ls_refs(struct repository *r, struct strvec *keys, die(_("expected flush after ls-refs arguments")); head_ref_namespaced(send_ref, &data); - for_each_namespaced_ref(send_ref, &data); + for_each_fullref_in_prefixes(get_git_namespace(), data.prefixes.v, + send_ref, &data, 0); packet_flush(1); strvec_clear(&data.prefixes); return 0; diff --git a/ref-filter.c b/ref-filter.c index aa260bfd09..c34bf34d06 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -1987,6 +1987,36 @@ static void find_longest_prefixes(struct string_list *out, strbuf_release(&prefix); } +int for_each_fullref_in_prefixes(const char *namespace, + const char **patterns, + each_ref_fn cb, + void *cb_data, + int broken) +{ + struct string_list prefixes = STRING_LIST_INIT_DUP; + struct string_list_item *prefix; + struct strbuf buf = STRBUF_INIT; + int ret = 0, namespace_len; + + find_longest_prefixes(&prefixes, patterns); + + if (namespace) + strbuf_addstr(&buf, namespace); + namespace_len = buf.len; + + for_each_string_list_item(prefix, &prefixes) { + strbuf_addf(&buf, prefix->string); + ret = for_each_fullref_in(buf.buf, cb, cb_data, broken); + if (ret) + break; + strbuf_setlen(&buf, namespace_len); + } + + string_list_clear(&prefixes, 0); + strbuf_release(&buf); + return ret; +} + /* * This is the same as for_each_fullref_in(), but it tries to iterate * only over the patterns we'll care about. Note that it _doesn't_ do a full @@ -1997,10 +2027,6 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, void *cb_data, int broken) { - struct string_list prefixes = STRING_LIST_INIT_DUP; - struct string_list_item *prefix; - int ret; - if (!filter->match_as_path) { /* * in this case, the patterns are applied after @@ -2024,16 +2050,8 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter, return for_each_fullref_in("", cb, cb_data, broken); } - find_longest_prefixes(&prefixes, filter->name_patterns); - - for_each_string_list_item(prefix, &prefixes) { - ret = for_each_fullref_in(prefix->string, cb, cb_data, broken); - if (ret) - break; - } - - string_list_clear(&prefixes, 0); - return ret; + return for_each_fullref_in_prefixes(NULL, filter->name_patterns, + cb, cb_data, broken); } /* diff --git a/ref-filter.h b/ref-filter.h index feaef4a8fd..f666a0fb49 100644 --- a/ref-filter.h +++ b/ref-filter.h @@ -146,4 +146,14 @@ struct ref_array_item *ref_array_push(struct ref_array *array, const char *refname, const struct object_id *oid); +/** + * iterate all refs which descend from the longest common prefix among + * "patterns". + */ +int for_each_fullref_in_prefixes(const char *namespace, + const char **patterns, + each_ref_fn cb, + void *cb_data, + int broken); + #endif /* REF_FILTER_H */ -- 2.30.0.138.g6d7191ea01