Re: [PATCH v3] connect: in ref advertisement, shallows are last

Brandon Williams <bmwill@xxxxxxxxxx> · Fri, 22 Sep 2017 09:45:04 -0700

On 09/22, Junio C Hamano wrote:
> Junio C Hamano <gitster@xxxxxxxxx> writes:
> 
> > Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:
> >
> >> Currently, get_remote_heads() parses the ref advertisement in one loop,
> >> allowing refs and shallow lines to intersperse, despite this not being
> >> allowed by the specification. Refactor get_remote_heads() to use two
> >> loops instead, enforcing that refs come first, and then shallows.
> >>
> >> This also makes it easier to teach get_remote_heads() to interpret other
> >> lines in the ref advertisement, which will be done in a subsequent
> >> patch.
> >
> > Sounds sensible.  This still replaces the earlier 1.5?
> 
> Well, it does, but it also invalidates how the new "pick the version
> offered and used" feature is integrated to this callchain.  I guess
> we'd need a new "we are now expecting the version info" state in a
> patch to replace "connect: teach client to recognize v1 server
> response".

Yeah given we go with this patch, which is probably a better cleanup
than what I attempted, then I would need to change how a client
recognizes a v1 server.  That would probably be easily done by adding a
new state.

I do think that once a v2 protocol rolls around we'll probably have to
do even more refactoring because I don't think we'll want to keep all
the version checking logic in get_remote_heads() for different protocol
versions which may not be interested in a servers ref advertisement, but
that'll be for another time.

> 
> >> +static int process_ref(int *state, int len, struct ref ***list,
> >> +		       unsigned int flags, struct oid_array *extra_have)
> >> +{
> >> +	struct object_id old_oid;
> >> +	char *name;
> >> +	int name_len;
> >> +
> >> +	if (len < GIT_SHA1_HEXSZ + 2 ||
> >> +	    get_oid_hex(packet_buffer, &old_oid) ||
> >> +	    packet_buffer[GIT_SHA1_HEXSZ] != ' ') {
> >> +		*state = EXPECTING_SHALLOW;
> >> +		return 0;
> >> +	}
> >> +
> >> +	name = packet_buffer + GIT_SHA1_HEXSZ + 1;
> >> +	name_len = strlen(name);
> >> +	if (*state == EXPECTING_REF_WITH_CAPABILITIES &&
> >> +	    len != name_len + GIT_SHA1_HEXSZ + 1) {
> >> +		free(server_capabilities);
> 
> Is this free() still needed?  After hitting this block, you'd set
> *state to EXPECTING_REF before you return, so nobody would set
> server_capabilities by hitting this block twice, and an attempt to
> do so will hit the die("unexpected cap") below, no?
> 
> Or it may be a signal that this patch tightens it too much and
> breaks older or third-party implementations of the other side that
> can emit more than one refs with capability advertisement?
> 
> >> +		server_capabilities = xstrdup(name + name_len + 1);
> >> +	} else if (*state == EXPECTING_REF) {
> >> +		if (len != name_len + GIT_SHA1_HEXSZ + 1)
> >> +			die("unexpected capabilities after ref name");
> >> +	}
> >> +	...
> >> +	}
> >> +	*state = EXPECTING_REF;
> >> +	return 1;
> >> +}
> 
> >> @@ -123,76 +208,26 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
> >>  	 * willing to talk to us.  A hang-up before seeing any
> >>  	 * response does not necessarily mean an ACL problem, though.
> >>  	 */
> >> -	int saw_response;
> >> -	int got_dummy_ref_with_capabilities_declaration = 0;
> >> +	int responded = 0;
> >> +	int len;
> >> +	int state = EXPECTING_REF_WITH_CAPABILITIES;
> >>  
> >>  	*list = NULL;
> 
> >> +	while ((len = read_remote_ref(in, &src_buf, &src_len, &responded))) {
> >> +		switch (state) {
> >> +		case EXPECTING_REF_WITH_CAPABILITIES:
> >> +		case EXPECTING_REF:
> >> +			if (process_ref(&state, len, &list, flags, extra_have))
> >> +				break;
> >> +			/* fallthrough */
> 
> OK.  This fallthrough is because expecting-ref is really expecting
> ref or shallow and once we see a shallow, we no longer expect ref
> and expect only shallow.  So from that point of view, an assignment
> to set state to EXPECTING_SHALLOW could happen here, not inside
> process_ref.  I mention this because in general, passing state
> around and let it be updated in helper functions would make the
> state transition harder to follow, not easier, even though
> refactoring the processing needed in different stages into helper
> functions like this patch does ought to make it easier to see by
> shrinking the outer loop (i.e. this one) that controls the whole
> process.
> 
> I think if we split process_ref() further into two, then we no
> longer need to pass &state to that function?  We start this loop
> with "expecting the dummy ref (or other)" state, have a new
> process_dummy_ref() function check if we got "capabilities^{}" thing
> and do its thing if that is the case (otherwise we fall through to
> the call to process_ref(), just like the above code falls through to
> call process_shallow() when it realizes what it got is not a ref),
> and after the first call to process_dummy_ref() we'd be in the
> "expecting ref (or other)" state---and the state transition can
> happen in this caller, not in process_dummy_ref() or process_ref().
> 
> Inside process_dummy_ref() and process_ref(), there would be a call
> to the same helper that notices and extracts the server capability
> and stores it (or barfs against the second line that advertises the
> capability, by noticing that server_capabilities is not NULL).
> 
> Wouldn't that make the presentation of the state machine cleaner?

I mentioned this when looking at v2 of this patch, that it would
probably be cleaner to remove passing the state variable around the
place and updating it inside a helper function.  It would just make the
logic simpler to follow if 'state' is updated directly instead of
indirectly.

-- 
Brandon Williams