Re: [PATCH 1.5/8] connect: die when a capability line comes after a ref

Brandon Williams <bmwill@xxxxxxxxxx> · Wed, 20 Sep 2017 13:06:27 -0700

On 09/20, Jeff King wrote:
> On Wed, Sep 20, 2017 at 11:48:32AM -0700, Brandon Williams wrote:
> 
> > Commit eb398797c (connect: advertized capability is not a ref,
> > 2016-09-09) taught 'get_remote_heads()' to recognize that the
> > 'capabilities^{}' line isn't a ref but required that the
> > 'capabilities^{}' line came during the first response from the server.
> > A future patch will introduce a version string sent by the server during
> > its first response which can then cause a client to unnecessarily die if
> > a 'capabilities^{}' line sent as the first ref.
> > 
> > Teach 'get_remote_heads()' to instead die if a 'capabilities^{}' line is
> > sent after a ref.
> 
> Hmm. I think I understand why you'd want this loosening. But why are we
> sending a version line to a client that we don't know is speaking v2?
> IOW, shouldn't we be reporting the version to the client in the normal
> capabilities when we don't know for sure that they can handle the new
> field? Otherwise we're breaking existing clients.

The client requested the version, this is the servers response.  So
older clients shouldn't be broken because they wouldn't be requesting
the newer versions.

> 
> Or is this only for v2 clients, and we've changed the protocol but
> get_remote_heads() just needs to be updated, too?

A client which didn't request protocol v1 (I'm calling the current
protocol v0, and v1 is just v0 with the initial response from the server
containing a version string) should not receive a version string in the
initial response.  The problem is that when introducing the version
string to protocol version 1, I didn't want to have to do a huge
refactoring of ALL of the current transport code so I stuck the version
check in get_remote_heads() since v1 is exactly the same as v0, except
for the first line from the server.

When we introduce v2, I'm sure we'll have to do more refactoring to
separate out the logic for the different versions.
> 
> > diff --git a/connect.c b/connect.c
> > index df56c0cbf..af5096ec6 100644
> > --- a/connect.c
> > +++ b/connect.c
> > @@ -124,10 +124,11 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
> >  	 * response does not necessarily mean an ACL problem, though.
> >  	 */
> >  	int saw_response;
> > +	int seen_ref;
> >  	int got_dummy_ref_with_capabilities_declaration = 0;
> > 
> >  	*list = NULL;
> > -	for (saw_response = 0; ; saw_response = 1) {
> > +	for (saw_response = 0, seen_ref = 0; ; saw_response = 1) {
> 
> If we're not going to update it in the right-hand side of the for-loop,
> should we perhaps not be initializing it in the left-hand side? I.e.,
> can we just do:
> 
>   seen_ref = 0;
> 
> above the loop, like we initialize "list"?
> 
> (For that matter, could we just be checking whether *list is NULL?)

True, that would probably be the better way to do this.

> 
> > @@ -165,6 +166,8 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
> > 
> >  		name_len = strlen(name);
> >  		if (len != name_len + GIT_SHA1_HEXSZ + 1) {
> > +			if (seen_ref)
> > +				; /* NEEDSWORK: Error out for multiple capabilities lines? */
> >  			free(server_capabilities);
> >  			server_capabilities = xstrdup(name + name_len + 1);
> >  		}
> 
> Interesting question. Probably it would be fine to. Coincidentally I ran
> across a similar case. It seems that upload-pack will read multiple
> capabilities lines back from the client. I.e., if it gets:
> 
>   want 1234abcd... foo
>   want 5678abcd... bar
> 
> then it will turn on both the "foo" and "bar" capabilities. I'm pretty
> sure this is unintended, and is somewhat counter to the way that clients
> handle multiple lines (which is to forget the old line and respect only
> the new one, as shown in the quoted hunk).
> 
> I wonder if we should be outlawing extra capabilities in both
> directions. I don't _think_ we've ever relied on that working, and I
> don't have much sympathy for any 3rd-party implementation that does
> (though I doubt that any exists).
> 
> That tangent aside, I do this hunk is kind of orthogonal to the point of
> your patch. We're talking about potential _tightening_ here, whereas the
> point of your patch is loosening. And it's not clear to me what we want
> to tighten:
> 
>   - should capabilities come as part of the first response, even if we
>     have no refs? In which case we really want "if (saw_response)" here.
> 
>   - should they came as part of the first ref (or pseudo-ref), in which
>     case "if (seen_ref)" is the right thing.
> 
>   - should we loosen it to complaining when there are multiple
>     capabilities sent. In which case "if (server_capabilities)" is the
>     right thing.
> 
> I'm not sure which we'd want, but it really seems like a separate topic
> that should be explored on top.

I wasn't sure either, which is why I added the comment to prod
discussion.  I agree that is is orthogonal to this series so I'll most
likely drop it, as it doesn't help with the protocol transition
discussion.

-- 
Brandon Williams