Re: [PATCH 5/6] fetch: teach independent negotiation (no packfile)

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Mon, 3 May 2021 14:52:39 -0700

> On 4/8/21 9:10 PM, Jonathan Tan wrote:
> > Currently, the packfile negotiation step within a Git fetch cannot be
> > done independent of sending the packfile, even though there is at least
> > one application wherein this is useful. Therefore, make it possible for
> > this negotiation step to be done independently. A subsequent commit will
> > use this for one such application - push negotiation.
> ...
> > --- a/Documentation/technical/protocol-v2.txt
> > +++ b/Documentation/technical/protocol-v2.txt
> > @@ -346,6 +346,14 @@ explained below.
> >  	client should download from all given URIs. Currently, the
> >  	protocols supported are "http" and "https".
> >  
> > +If the 'wait-for-done' feature is advertised, the following argument
> > +can be included in the client's request.
> > +
> > +    wait-for-done
> > +	Indicates to the server that it should never send "ready", but
> > +	should wait for the client to say "done" before sending the
> > +	packfile.
> > +
> 
> Ok, this is a good way to handle the compatibility case. I'll check
> in the next patch how this feature is checked before sending a pack-file
> in 'git push'.

Thanks.

> > @@ -202,6 +203,8 @@ static struct option builtin_fetch_options[] = {
> >  			TRANSPORT_FAMILY_IPV6),
> >  	OPT_STRING_LIST(0, "negotiation-tip", &negotiation_tip, N_("revision"),
> >  			N_("report that we have only objects reachable from this object")),
> > +	OPT_BOOL(0, "negotiate-only", &negotiate_only,
> > +		 N_("do not fetch a packfile; instead, print ancestors of negotiation tips")),
> 
> It seems a little strange for this step, which is mostly used for tests,
> to be so visible. I think it would be too much work to create a test
> helper with an equivalent implementation, so maybe describe this option
> as only for tests in Documentation/git-fetch.txt.

As you mention in your review of the subsequent patch [1], this is
because we're calling "fetch" as a subprocess from "push", so it needs
to accept command-line arguments specifying that we're not interested in
the packfile.

[1] https://lore.kernel.org/git/c2780570-81ec-bede-2f4e-75748b788bbb@xxxxxxxxx/

> > -	if (remote) {
> > +	if (negotiate_only) {
> > +		struct oidset acked_commits = OIDSET_INIT;
> > +		struct oidset_iter iter;
> > +		const struct object_id *oid;
> > +
> > +		if (!remote)
> > +			die(_("Must supply remote when using --negotiate-only"));
> 
> Please use lowercase letters to start your error messages.

Thanks - done.

> > +static int add_to_object_array(const struct object_id *oid, void *data)
> > +{
> > +	struct object_array *a = data;
> > +
> > +	add_object_array(parse_object(the_repository, oid), "", a);
> > +	return 0;
> > +}
> 
> Why parse_object() over lookup_object()?

Switched to lookup_object(). I don't think it matters here because we'll
need to parse all commits when doing negotiation anyway (to find their
parents), but I changed it to avoid confusion.

> > +void negotiate_using_fetch(const struct oid_array *negotiation_tips,
> > +			   const struct string_list *server_options,
> > +			   int stateless_rpc,
> > +			   int fd[],
> > +			   struct oidset *acked_commits)
> 
> It might be worth a careful doc comment to say that we _must_
> have already verified the wait-for-done capability.

OK - written in the .h file.

[snip min_generation explanation]

> And finally here you can use min_generation. Otherwise, you will suffer
> performance problems when there is a deep history but this returns false.
> 
> As long as min_generation is the minimum generation of a commit with the
> COMMON flag, you will be correct to use that limit.

Thanks for the precise guidance of how to use generation numbers to
speed things up. I've implemented your suggestions.

> 
> > +			last_iteration = 1;
> > +	}
> > +	strbuf_release(&req_buf);
> 
> Don't forget to clear the COMMON flag before returning.

OK - done. (I don't think it matters here, but I guess it's clearer if
we do as you suggest. The loop will take some time to run, but it's
small compared to the network time needed anyway.)

> > +/*
> > + * Execute the --negotiate-only mode of "git fetch", adding all known common
> > + * commits to acked_commits.
> > + */> +void negotiate_using_fetch(const struct oid_array *negotiation_tips,
> > +			   const struct string_list *server_options,
> > +			   int stateless_rpc,
> > +			   int fd[],
> > +			   struct oidset *acked_commits);
> 
> Here is a good place to specifically mention the wait-for-done capability,
> or else this method will die() when seeing the "ready" from the server.

Acknowledged.

> > diff --git a/transport-helper.c b/transport-helper.c
> > index 4cd76366fa..4be035edb8 100644
> > --- a/transport-helper.c
> > +++ b/transport-helper.c
> > @@ -684,6 +684,16 @@ static int fetch(struct transport *transport,
> >  		return transport->vtable->fetch(transport, nr_heads, to_fetch);
> >  	}
> >  
> > +	/*
> > +	 * If we reach here, then the server, the client, and/or the transport
> > +	 * helper does not support protocol v2. --negotiate-only requires
> > +	 * protocol v2.
> > +	 */
> > +	if (data->transport_options.acked_commits) {
> > +		warning(_("--negotiate-only requires protocol v2"));
> > +		return -1;
> > +	}
> > +
> 
> This method continues to do a lot that doesn't seem specific to
> --negotiate-only. The warning message seems incorrect to me, but
> is also seems like this would break several cases when using
> protocol v0. It is equally possible that I'm misunderstanding
> what is going on here.

It's specific to --negotiate-only in that acked_commits is set if and
only if we use --negotiate-only. What cases are you think of that break
when using protocol v0?

> > +	if (data->options.acked_commits) {
> > +		if (data->version < protocol_v2) {
> > +			warning(_("--negotiate-only requires protocol v2"));
> > +			ret = -1;
> > +		} else if (!server_supports_feature("fetch", "wait-for-done", 0)) {
> > +			warning(_("server does not support wait-for-done"));
> > +			ret = -1;
> > +		} else {
> > +			negotiate_using_fetch(data->options.negotiation_tips,
> > +					      transport->server_options,
> > +					      transport->stateless_rpc,
> > +					      data->fd,
> > +					      data->options.acked_commits);
> > +			ret = 0;
> > +		}
> > +		goto cleanup;
> > +	}
> 
> These prereqs look right. This makes it look rather strange that
> negotiate_using_fetch() returns void (and has several die() calls inside)
> and could instead return an 'int' that is consumed here (and use
> error() calls instead of die()).

When negotiate_using_fetch() is used, we are already past the capability
advertisement step, so we need to believe here that the wait-for-done
prerequisite has been met. (Admittedly, said prerequisite wasn't
documented, but I've documented it following your suggestion above.) So
I think it would be too late to return an "int" after several
negotiation steps have occurred and the packfile downloaded.

> > +
> > +	/*
> > +	 * If set, whenever transport_fetch_refs() is called, add known common
> > +	 * commits to this oidset instead of fetching any packfiles.
> > +	 */
> > +	struct oidset *acked_commits;
> 
> nit: s/If set/If allocated/

Changed.

> > @@ -1668,10 +1673,13 @@ int upload_pack_v2(struct repository *r, struct strvec *keys,
> >  		case FETCH_PROCESS_ARGS:
> >  			process_args(request, &data);
> >  
> > -			if (!data.want_obj.nr) {
> > +			if (!data.want_obj.nr && !data.wait_for_done) {
> >  				/*
> > -				 * Request didn't contain any 'want' lines,
> > -				 * guess they didn't want anything.
> > +				 * Request didn't contain any 'want' lines (and
> > +				 * the request does not contain
> > +				 * "wait-for-done", in which it is reasonable
> > +				 * to just send 'have's without 'want's); guess
> > +				 * they didn't want anything.
> >  				 */
> >  				state = FETCH_DONE;
> >  			} else if (data.haves.nr) {
> > @@ -1723,7 +1731,7 @@ int upload_pack_advertise(struct repository *r,
> >  		int allow_sideband_all_value;
> >  		char *str = NULL;
> >  
> > -		strbuf_addstr(value, "shallow");
> > +		strbuf_addstr(value, "shallow wait-for-done");
> 
> It might be nice to have this wait-for-done capability configurable.
> I'd likely want this on by default, but in case there is an issue it
> is easier to change config values than ship new binaries.

I felt that we didn't need this knob on the server side as this doesn't
expose any more information than a regular fetch, but I'm open to adding
it if neeed be.

> This is a pretty big patch. It might have been nice to split the
> server-side stuff out as its own patch with a follow-up for the client
> code and the tests.
> 
> Thanks,
> -Stolee

Personally I prefer that such changes (server and client) be together,
so that I can see on both sides how things work. There is (in theory)
already a clear divide between both sides, even within a patch, since
the server and client parts are usually in different parts. But if the
majority prefers that such changes be separate, I'm OK with doing that
too.

Also, thanks for your review.