Re: [PATCH 1/1] ls-refs.c: minimize number of refs visited

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jacob,

On Tue, Jan 19, 2021 at 03:42:51PM +0100, Jacob Vosmaer wrote:
> The previous implementation of ls-refs would perform exactly one ref
> walk, matching each ref against the prefixes (if any) provided by the
> user. This can be expensive if there are a lot of refs and the user
> only cares about a small subset of them.
>
> In this patch we analyze the prefixes provided by the user and build a
> minimal set of disjoint prefixes that contains all of them. We then do
> a ref walk for each of these minimal prefixes.

This reminds me of b31e2680c4 (ref-filter.c: find disjoint pattern
prefixes, 2019-06-26), where we solved a very similar problem for 'git
for-each-ref'. The difference here is that we are operating on a set of
prefixes, not a set of refs.

But, I think that we could get pretty far by treating the prefixes as
refs so that we can call ref-filter.c:find_longest_prefixes(). For its
purposes, it doesn't really care about whether or not the arguments
actually are references. It simply returns the longest common prefix
among all of its arguments (delimited by '/' characters).

> It is tempting to have just one strvec for the prefixes and use it
> both for matching and for iterating. But every time I tried that, it
> made things more complicated. I settled on leaving the existing ref
> matching (using &data.prefixes) alone, and I added a new layer around
> it for the ref walk optimization (using &iter_prefixes).

I think the implementation in b31e2680c4 deals with this nicely: it
takes a pointer to a strvec and dumps prefixes in there.

> This commit also fixes a bug in ls-refs.c that was not triggered
> before: we were using a strvec set to zero, which is not how you are
> supposed to initialize a strvec. We now call strvec_init after zeroing.

Good.

> Signed-off-by: Jacob Vosmaer <jacob@xxxxxxxxxx>
> ---
>  ls-refs.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/ls-refs.c b/ls-refs.c
> index a1e0b473e4..6d5f0c769a 100644
> --- a/ls-refs.c
> +++ b/ls-refs.c
> @@ -84,12 +84,44 @@ static int ls_refs_config(const char *var, const char *value, void *data)
>  	return parse_hide_refs_config(var, value, "uploadpack");
>  }
>
> +static int cmp_prefix(const void *a_, const void *b_){
> +	const char *a = *(const char **)a_;
> +	const char *b = *(const char **)b_;
> +	return strcmp(a, b);
> +}
> +
> +static void deduplicate_prefixes(struct strvec *prefixes) {
> +	int i;
> +
> +	QSORT(prefixes->v, prefixes->nr, cmp_prefix);
> +
> +	for (i = 1; i < prefixes->nr;) {
> +		const char *p = prefixes->v[i];
> +
> +		/*
> +		 * If p is "refs/foobar" and its predecessor is "refs/foo" then we should
> +		 * drop p, both to avoid sending duplicate refs to the user, and to avoid
> +		 * doing unnecessary work.
> +		 */
> +		if (starts_with(p, prefixes->v[i - 1])) {
> +			MOVE_ARRAY(&prefixes->v[i], &prefixes->v[i + 1], prefixes->nr - (i + 1));
> +			prefixes->v[prefixes->nr - 1] = p;
> +			strvec_pop(prefixes);
> +		} else {
> +			i++;
> +		}
> +	}
> +}
> +

Indeed, this and the below code are very reminiscent of b31e2680c4. So,
I wonder if it's possible to use the existing implementation rather than
implement what is roughly the same thing twice.

Below is a completely untested patch to try and reuse the code from
b31e2680c4. (It compiles, but that's the extent of my guarantees about
it ;-).) It's all smashed into one huge patch, so if you're happy with
the direction I'll take care of cleaning it up. The new function in
ref-filter.h really belongs in refs.h, but I left the implementation in
ref-filter.c to avoid creating more noise in the diff.

Let me know what you think.

Thanks,
Taylor

--- >8 ---

Subject: [PATCH] ls-refs: iterate longest common refs prefix

Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
---
 ls-refs.c    |  4 +++-
 ref-filter.c | 46 ++++++++++++++++++++++++++++++++--------------
 ref-filter.h | 10 ++++++++++
 3 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/ls-refs.c b/ls-refs.c
index a1e0b473e4..6a3e11d45c 100644
--- a/ls-refs.c
+++ b/ls-refs.c
@@ -6,6 +6,7 @@
 #include "ls-refs.h"
 #include "pkt-line.h"
 #include "config.h"
+#include "ref-filter.h"

 /*
  * Check if one of the prefixes is a prefix of the ref.
@@ -109,7 +110,8 @@ int ls_refs(struct repository *r, struct strvec *keys,
 		die(_("expected flush after ls-refs arguments"));

 	head_ref_namespaced(send_ref, &data);
-	for_each_namespaced_ref(send_ref, &data);
+	for_each_fullref_in_prefixes(get_git_namespace(), data.prefixes.v,
+				     send_ref, &data, 0);
 	packet_flush(1);
 	strvec_clear(&data.prefixes);
 	return 0;
diff --git a/ref-filter.c b/ref-filter.c
index aa260bfd09..c34bf34d06 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1987,6 +1987,36 @@ static void find_longest_prefixes(struct string_list *out,
 	strbuf_release(&prefix);
 }

+int for_each_fullref_in_prefixes(const char *namespace,
+				 const char **patterns,
+				 each_ref_fn cb,
+				 void *cb_data,
+				 int broken)
+{
+	struct string_list prefixes = STRING_LIST_INIT_DUP;
+	struct string_list_item *prefix;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = 0, namespace_len;
+
+	find_longest_prefixes(&prefixes, patterns);
+
+	if (namespace)
+		strbuf_addstr(&buf, namespace);
+	namespace_len = buf.len;
+
+	for_each_string_list_item(prefix, &prefixes) {
+		strbuf_addf(&buf, prefix->string);
+		ret = for_each_fullref_in(buf.buf, cb, cb_data, broken);
+		if (ret)
+			break;
+		strbuf_setlen(&buf, namespace_len);
+	}
+
+	string_list_clear(&prefixes, 0);
+	strbuf_release(&buf);
+	return ret;
+}
+
 /*
  * This is the same as for_each_fullref_in(), but it tries to iterate
  * only over the patterns we'll care about. Note that it _doesn't_ do a full
@@ -1997,10 +2027,6 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 				       void *cb_data,
 				       int broken)
 {
-	struct string_list prefixes = STRING_LIST_INIT_DUP;
-	struct string_list_item *prefix;
-	int ret;
-
 	if (!filter->match_as_path) {
 		/*
 		 * in this case, the patterns are applied after
@@ -2024,16 +2050,8 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		return for_each_fullref_in("", cb, cb_data, broken);
 	}

-	find_longest_prefixes(&prefixes, filter->name_patterns);
-
-	for_each_string_list_item(prefix, &prefixes) {
-		ret = for_each_fullref_in(prefix->string, cb, cb_data, broken);
-		if (ret)
-			break;
-	}
-
-	string_list_clear(&prefixes, 0);
-	return ret;
+	return for_each_fullref_in_prefixes(NULL, filter->name_patterns,
+					    cb, cb_data, broken);
 }

 /*
diff --git a/ref-filter.h b/ref-filter.h
index feaef4a8fd..f666a0fb49 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -146,4 +146,14 @@ struct ref_array_item *ref_array_push(struct ref_array *array,
 				      const char *refname,
 				      const struct object_id *oid);

+/**
+ * iterate all refs which descend from the longest common prefix among
+ * "patterns".
+ */
+int for_each_fullref_in_prefixes(const char *namespace,
+				 const char **patterns,
+				 each_ref_fn cb,
+				 void *cb_data,
+				 int broken);
+
 #endif /*  REF_FILTER_H  */
--
2.30.0.138.g6d7191ea01




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux