[RFH] limiting ref advertisements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm looking into the oft-discussed idea of reducing the size of ref
advertisements by having the client say "these are the refs I'm
interested in". Let's set aside the protocol complexities for a
moment and imagine we magically have some way to communicate a set of
patterns to the server.

What should those patterns look like?

I had hoped that we could keep most of the pattern logic on the
client-side. Otherwise we risk incompatibilities between how the client
and server interpret a pattern. I had also hoped we could do some kind
of prefix-matching, which would let the server look only at the
interesting bits of the ref tree (so if you don't care about
refs/changes, and the server has some ref storage that is hierarchical,
they can literally get away without opening that sub-tree).

The patch at the end of this email is what I came up with in that
direction. It obviously won't compile without the twenty other patches
implementing transport->advertise_prefixes, but it gives you a sense of
what I'm talking about.

Unfortunately it doesn't work in all cases, because refspec sources may
be unqualified. If I ask for:

  git fetch $remote master:foo

then we have to actually dwim-resolve "master" from the complete list of
refs we get from the remote.  It could be "refs/heads/master",
"refs/tags/master", etc. Worse, it could be "refs/master". In that case,
at least, I think we are OK because we avoid advertising refs directly
below "refs/" in the first place. But if you have a slash, like:

  git fetch $remote jk/foo

then that _could_ be "refs/jk/foo". Likewise, we cannot even optimize
the common case of a fully-qualified ref, like "refs/heads/foo". If it
exists, we obviously want to use that. But if it doesn't, then it
could be refs/something-else/refs/heads/foo. That's unlikely, but it
_does_ work now, and optimizing the advertisement would break it.

So it seems like left-anchoring the refspecs can never be fully correct.
We can communicate "master" to the server, who can then look at every
ref it would advertise and ask "could this be called master"? But it
will be setting in stone the set of "could this be" patterns. Granted,
those haven't changed much over the history of git, but it seems awfully
fragile.

In an ideal world the client and server would negotiate to come to some
agreement on the patterns being used. But as we are bolting this onto
the existing protocol, I was really trying to do it without introducing
an extra capabilities phase or extra round-trips. I.e., something like
David Turner's "stick the refspec in the HTTP query parameters" trick,
but working everywhere[1].

Clever ideas?

-Peff

[1] I do have working patches to pass these "early capabilities"
    everywhere, but they're still somewhat rough. I got it to the point
    where I could flip the default to "on" to see what breaks. That's
    not something we'd want to do for real, but is good for running the
    test suite to uncover issues like this one.

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7c10d70092..3a2585ffd7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -302,6 +302,33 @@ static void find_non_local_tags(struct transport *transport,
 	string_list_clear(&remote_refs, 0);
 }
 
+static void add_advertise_prefixes(struct transport *transport,
+				   const struct refspec *refs, int nr)
+{
+	struct argv_array *list = &transport->advertise_prefixes;
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		const struct refspec *rs = &refs[i];
+		size_t len;
+
+		if (!rs->pattern)
+			argv_array_push(list, rs->src);
+		else if (strip_suffix(rs->src, "/*", &len))
+			argv_array_pushf(list, "%.*s", (int)len, rs->src);
+		else {
+			/*
+			 * This refspec is too complex for us to communicate;
+			 * not only do we skip it, but we must avoid
+			 * communicating any prefixes, since we need to see
+			 * all refs.
+			 */
+			transport->ignore_advertise_prefixes = 1;
+			return;
+		}
+	}
+}
+
 static struct ref *get_ref_map(struct transport *transport,
 			       struct refspec *refspecs, int refspec_count,
 			       int tags, int *autotags)
@@ -314,12 +341,18 @@ static struct ref *get_ref_map(struct transport *transport,
 	/* opportunistically-updated references: */
 	struct ref *orefs = NULL, **oref_tail = &orefs;
 
-	const struct ref *remote_refs = transport_get_remote_refs(transport);
+	const struct ref *remote_refs;
+
+	if (tags == TAGS_SET || (tags == TAGS_DEFAULT && *autotags))
+		add_advertise_prefixes(transport, tag_refspec, 1);
 
 	if (refspec_count) {
 		struct refspec *fetch_refspec;
 		int fetch_refspec_nr;
 
+		add_advertise_prefixes(transport, refspecs, refspec_count);
+		remote_refs = transport_get_remote_refs(transport);
+
 		for (i = 0; i < refspec_count; i++) {
 			get_fetch_map(remote_refs, &refspecs[i], &tail, 0);
 			if (refspecs[i].dst && refspecs[i].dst[0])
@@ -373,6 +406,17 @@ static struct ref *get_ref_map(struct transport *transport,
 		    (remote->fetch_refspec_nr ||
 		     /* Note: has_merge implies non-NULL branch->remote_name */
 		     (has_merge && !strcmp(branch->remote_name, remote->name)))) {
+
+			add_advertise_prefixes(transport, remote->fetch,
+					       remote->fetch_refspec_nr);
+			if (has_merge && !strcmp(branch->remote_name, remote->name)) {
+				int i;
+				for (i = 0; i < branch->merge_nr; i++)
+					add_advertise_prefixes(transport, branch->merge[i], 1);
+			}
+
+			remote_refs = transport_get_remote_refs(transport);
+
 			for (i = 0; i < remote->fetch_refspec_nr; i++) {
 				get_fetch_map(remote_refs, &remote->fetch[i], &tail, 0);
 				if (remote->fetch[i].dst &&
@@ -393,6 +437,8 @@ static struct ref *get_ref_map(struct transport *transport,
 			    !strcmp(branch->remote_name, remote->name))
 				add_merge_config(&ref_map, remote_refs, branch, &tail);
 		} else {
+			argv_array_push(&transport->advertise_prefixes, "HEAD");
+			remote_refs = transport_get_remote_refs(transport);
 			ref_map = get_remote_ref(remote_refs, "HEAD");
 			if (!ref_map)
 				die(_("Couldn't find remote ref HEAD"));



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]