Re: [PATCH 5/6] remote-curl: error on incomplete packet

Jeff King <peff@xxxxxxxx> · Fri, 15 May 2020 17:38:44 -0400

On Wed, May 13, 2020 at 02:04:57PM -0400, Denton Liu wrote:

> Currently, remote-curl acts as a proxy and blindly forwards packets
> between an HTTP server and fetch-pack. In the case of a stateless RPC
> connection where the connection is terminated with a partially written
> packet, remote-curl will blindly send the partially written packet
> before waiting on more input from fetch-pack. Meanwhile, fetch-pack will
> read the partial packet and continue reading, expecting more input. This
> results in a deadlock between the two processes.
> 
> Instead of blindly forwarding packets, inspect each packet to ensure
> that it is a full packet, erroring out if a partial packet is sent.

Hmm. Right now there's no assumption in rpc_in that we're writing
pktlines. Will this always be the case?

I think the answer is probably yes. If there's a case where it isn't, it
might be something like v0 git-over-http against a server which doesn't
have the sideband capability.

> diff --git a/remote-curl.c b/remote-curl.c
> index da3e07184a..8b740354e5 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -682,6 +682,8 @@ static curlioerr rpc_ioctl(CURL *handle, int cmd, void *clientp)
>  struct rpc_in_data {
>  	struct rpc_state *rpc;
>  	struct active_request_slot *slot;
> +	struct strbuf len_buf;
> +	int remaining;

This remaining is "remaining in the current packet", I assume? An "int"
should be OK for that.

Using a strbuf feels a bit like overkill, because we have to remember to
free it (which I think doesn't actually happen in your patch). Could we
just use a "char len_buf[4]" (you'd need an extra int to count how many
bytes you've received there).

> @@ -702,7 +705,42 @@ static size_t rpc_in(char *ptr, size_t eltsize,
>  		return size;
>  	if (size)
>  		data->rpc->any_written = 1;
> -	write_or_die(data->rpc->in, ptr, size);
> +
> +	while (unwritten) {
> +		if (!data->remaining) {
> +			int digits_remaining = 4 - data->len_buf.len;
> +			if (digits_remaining > unwritten)
> +				digits_remaining = unwritten;
> +			strbuf_add(&data->len_buf, ptr, digits_remaining);
> +			ptr += digits_remaining;
> +			unwritten -= digits_remaining;

So we actually save up the 4 bytes, not sending them at all until we get
them. I wonder if this might be easier to follow if our count was simply
advisory. I.e., continue to write data as we get it, but keep a small
state machine tracking pktlines (which could even be done in its own
separate struct/function pair).

I dunno. It might be about the same level of confusing, but it makes it
easy to keep the logic separate from rpc_in, and it lets us put all of
the policy bits at the end, after we know we've received all of the
data (in post_rpc):

  if (data->len_buf.len < 4)
          die("incomplete packet header");
  if (data->remaining)
          die("incomplete packet");
  /* imagine we kept the actual pktlen value, instead of just counting
   * down remaining */
  if (data->pktlen)
          die("did not end in flush");

Notably, I'm not sure if your code will complain if the connection dies
will reading the 4-byte header (remaining would still be 0). That won't
leave the caller trying to read the packet (we never sent them the
header), but they may still be waiting for a response.

> +			if (data->len_buf.len == 4) {
> +				data->remaining = packet_length(data->len_buf.buf);
> +				if (data->remaining < 0) {
> +					die(_("remote-curl: bad line length character: %.4s"), data->len_buf.buf);
> +				} else if (data->remaining <= 1) {
> +					data->remaining = 0;

This treats 0001 delimiters the same as a 0000 flush. Expecting 0 more
bytes is the right thing, but would we later want to differentiate in
post_rpc()? I.e., is it ever correct for the server data to end with a
delim?

> +				} else if (data->remaining < 4) {
> +					die(_("remote-curl: bad line length %d"), data->remaining);

We don't use an 0002 or 0003 packet for anything yet, but this would
need to be updated if we ever did. I wonder if we should treat them also
as zero-length packets and quietly pass through, which is likely the
right thing to do. OTOH, this should complain loudly enough that
somebody would presumably notice as soon as they tried to use those
packets.

Regarding testing, I think the ideal thing would be a proxy layer we
could insert that simply eats all data after N bytes. That would be easy
to do if we could use --upload-pack='git-upload-pack | head -c 500' or
something. But we need it to happen between curl and the server side of
Git. Possibly we could insert something between apache and
git-http-backend (simialr to how we handle broken-smart-http.sh.

-Peff