Re: [PATCH v3 34/35] remote-curl: implement stateless-connect command

Brandon Williams <bmwill@xxxxxxxxxx> · Wed, 28 Feb 2018 12:21:42 -0800

On 02/27, Jonathan Nieder wrote:
> Hi,
> 
> Brandon Williams wrote:
> 
> > Teach remote-curl the 'stateless-connect' command which is used to
> > establish a stateless connection with servers which support protocol
> > version 2.  This allows remote-curl to act as a proxy, allowing the git
> > client to communicate natively with a remote end, simply using
> > remote-curl as a pass through to convert requests to http.
> 
> Cool!  I better look at the spec for that first.
> 
> *looks at the previous patch*
> 
> Oh, there is no documented spec. :/  I'll muddle through this instead,
> then.

I'll make sure to add one :)

> 
> [...]
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -188,7 +188,10 @@ static struct ref *parse_git_refs(struct discovery *heads, int for_push)
> >  	heads->version = discover_version(&reader);
> >  	switch (heads->version) {
> >  	case protocol_v2:
> > -		die("support for protocol v2 not implemented yet");
> > +		/*
> > +		 * Do nothing.  Client should run 'stateless-connect' and
> > +		 * request the refs themselves.
> > +		 */
> >  		break;
> 
> This is the 'list' command, right?  Since we expect the client to run
> 'stateless-connect' instead, can we make it error out?

Yes and no.  Remote-curl will run this when trying to establish a
stateless-connection.  If the response is v2 then this is a capability
list and not refs.  So the capabilities will be dumped to the client and
they will be able to request the refs themselves at a later point.  The
comment here is just misleading, so i'll make sure to fix it.

> 
> [...]
> > @@ -1082,6 +1085,184 @@ static void parse_push(struct strbuf *buf)
> >  	free(specs);
> >  }
> >  
> > +struct proxy_state {
> > +	char *service_name;
> > +	char *service_url;
> > +	struct curl_slist *headers;
> > +	struct strbuf request_buffer;
> > +	int in;
> > +	int out;
> > +	struct packet_reader reader;
> > +	size_t pos;
> > +	int seen_flush;
> > +};
> 
> Can this have a comment describing what it is/does?  It's not obvious
> to me at first glance.
> 
> It doesn't have to go in a lot of detail since this is an internal
> implementation detail, but something saying e.g. that this represents
> a connection to an HTTP server (that's an example; I'm not saying
> that's what it represents :)) would help.

Always making new code have higher standards than the existing code ;)
Haha, I'll add a simple comment explaining it.

> 
> > +
> > +static void proxy_state_init(struct proxy_state *p, const char *service_name,
> > +			     enum protocol_version version)
> [...]
> > +static void proxy_state_clear(struct proxy_state *p)
> 
> Looks sensible.
> 
> [...]
> > +static size_t proxy_in(char *buffer, size_t eltsize,
> > +		       size_t nmemb, void *userdata)
> 
> Can this have a comment describing the interface?  (E.g. does it read
> a single pkt_line?  How is the caller expected to use it?  Does it
> satisfy the interface of some callback?)

I'll add a comment that its used as a READFUNCTION callback for curl and
that it tries to copy over a packet-line at a time.

> 
> libcurl's example https://curl.haxx.se/libcurl/c/ftpupload.html just
> calls this read_callback.  Such a name plus a pointer to
> CURLOPT_READFUNCTION should do the trick; bonus points if the comment 
> says what our implementation of the callback does.
> 
> Is this about having peek ability?

No its just that Curl only requests a set about of data at a time so you
need to be able to buffer the data that can't be read yet.

> 
> > +	struct proxy_state *p = userdata;
> > +	size_t avail = p->request_buffer.len - p->pos;
> > +
> > +	if (!avail) {
> > +		if (p->seen_flush) {
> > +			p->seen_flush = 0;
> > +			return 0;
> > +		}
> > +
> > +		strbuf_reset(&p->request_buffer);
> > +		switch (packet_reader_read(&p->reader)) {
> > +		case PACKET_READ_EOF:
> > +			die("unexpected EOF when reading from parent process");
> > +		case PACKET_READ_NORMAL:
> > +			packet_buf_write_len(&p->request_buffer, p->reader.line,
> > +					     p->reader.pktlen);
> > +			break;
> > +		case PACKET_READ_DELIM:
> > +			packet_buf_delim(&p->request_buffer);
> > +			break;
> > +		case PACKET_READ_FLUSH:
> > +			packet_buf_flush(&p->request_buffer);
> > +			p->seen_flush = 1;
> > +			break;
> > +		}
> > +		p->pos = 0;
> > +		avail = p->request_buffer.len;
> > +	}
> > +
> > +	if (max < avail)
> > +		avail = max;
> > +	memcpy(buffer, p->request_buffer.buf + p->pos, avail);
> > +	p->pos += avail;
> > +	return avail;
> 
> This is a number of bytes, but CURLOPT_READFUNCTION expects a number
> of items, fread-style.  That is:
> 
> 	if (avail < eltsize)
> 		... handle somehow, maybe by reading in more? ...
> 
> 	avail_memb = avail / eltsize;
> 	memcpy(buffer,
> 	       p->request_buffer.buf + p->pos,
> 	       st_mult(avail_memb, eltsize));
> 	p->pos += st_mult(avail_memb, eltsize);
> 	return avail_memb;
> 
> But https://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html says
> 
> 	Your function must then return the actual number of bytes that
> 	it stored in that memory area.
> 
> Does this mean eltsize is always 1?  This is super confusing...
> 
> ... ok, a quick grep for fread_func in libcurl reveals that eltsize is
> indeed always 1.  Can we add an assertion so we notice if that
> changes?
> 
> 	if (eltsize != 1)
> 		BUG("curl read callback called with size = %zu != 1", eltsize);
> 	max = nmemb;

Yeah i can go ahead and do this.  Just note that the v1 path uses logic
identical to this so it would be a problem there.

> 
> [...]
> > +static size_t proxy_out(char *buffer, size_t eltsize,
> > +			size_t nmemb, void *userdata)
> > +{
> > +	size_t size = eltsize * nmemb;
> > +	struct proxy_state *p = userdata;
> > +
> > +	write_or_die(p->out, buffer, size);
> > +	return size;
> > +}
> 
> Nice.  Same questions about st_mult or just asserting on eltsize apply
> here, too.
> 
> [...]
> > +static int proxy_post(struct proxy_state *p)
> 
> What does this function do?  Can it get a brief comment?

Will do.

> 
> > +{
> > +	struct active_request_slot *slot;
> > +	int err;
> > +
> > +	slot = get_active_slot();
> > +
> > +	curl_easy_setopt(slot->curl, CURLOPT_NOBODY, 0);
> > +	curl_easy_setopt(slot->curl, CURLOPT_POST, 1);
> > +	curl_easy_setopt(slot->curl, CURLOPT_URL, p->service_url);
> > +	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, p->headers);
> > +
> > +	/* Setup function to read request from client */
> > +	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, proxy_in);
> > +	curl_easy_setopt(slot->curl, CURLOPT_READDATA, p);
> > +
> > +	/* Setup function to write server response to client */
> > +	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, proxy_out);
> > +	curl_easy_setopt(slot->curl, CURLOPT_WRITEDATA, p);
> > +
> > +	err = run_slot(slot, NULL);
> > +
> > +	if (err != HTTP_OK)
> > +		err = -1;
> > +
> > +	return err;
> 
> HTTP_OK is 0 but kind of obscures that.  How about something like the
> following?
> 
> 	if (run_slot(slot, NULL))
> 		return -1;
> 	return 0;
> 
> or
> 
> 	if (run_slot(slot, NULL) != HTTP_OK)
> 		return -1;
> 	return 0;
> 
> That way it's clearer that this always returns 0 or -1.

Sounds good.

> 
> [...]
> > +static int stateless_connect(const char *service_name)
> > +{
> > +	struct discovery *discover;
> > +	struct proxy_state p;
> > +
> > +	/*
> > +	 * Run the info/refs request and see if the server supports protocol
> > +	 * v2.  If and only if the server supports v2 can we successfully
> > +	 * establish a stateless connection, otherwise we need to tell the
> > +	 * client to fallback to using other transport helper functions to
> > +	 * complete their request.
> > +	 */
> > +	discover = discover_refs(service_name, 0);
> > +	if (discover->version != protocol_v2) {
> > +		printf("fallback\n");
> > +		fflush(stdout);
> > +		return -1;
> 
> Interesting.  I wonder if we can make remote-curl less smart and drive
> this more from the caller.
> 
> E.g. if the caller could do a single stateless request, they could do:
> 
> 	option git-protocol version=2
> 	stateless-request GET info/refs?service=git-upload-pack
> 	[pkt-lines, ending with a flush-pkt]
> 
> The git-protocol option in this hypothetical example is the value to
> be passed in the Git-Protocol header.
> 
> Then based on the response, the caller could decide to keep using
> stateless-request for further requests or fall back to "fetch".
> 
> That way, if we implement some protocol v3, the remote helper would
> not have to be changed at all to handle it: the caller would instead
> make the new v3-format request and remote-curl would be able to oblige
> without knowing why they're doing it.
> 
> What do you think?

I do see the draw for wanting this.  I think a change like this requires
a lot more refactoring, simply because with the current setup the
fetch/ls-refs logic doesn't care that its talking through a
remote-helper where if we went down that route it would need to be aware
of that.

-- 
Brandon Williams