Re: [PATCH 4/4] remote-curl: read in the push report even if we fail to finish sending data

Carlos Martín Nieto <cmn@xxxxxxx> · Tue, 23 Jul 2024 17:07:31 +0200

On Thu, 2024-06-13 at 05:55 -0400, Jeff King wrote:
> On Wed, Jun 12, 2024 at 01:50:28PM +0200, Carlos Martín Nieto wrote:
> 
> > If we just consume send-pack's output and don't send anything to
> > remote-helper, it will not update any of its structures and will report
> > "Everything up-to-date" next to the error message.
> 
> OK, consuming the output at the helper level makes some sense to me.
> But...
> 
> > diff --git a/remote-curl.c b/remote-curl.c
> > index 0b6d7815fdd..9e45e14afec 100644
> > --- a/remote-curl.c
> > +++ b/remote-curl.c
> > @@ -1114,15 +1114,25 @@ static int rpc_service(struct rpc_state *rpc, struct discovery *heads,
> >  
> >  	close(client.in);
> >  	client.in = -1;
> > -	if (!err) {
> > -		strbuf_read(rpc_result, client.out, 0);
> > -	} else {
> > -		char buf[4096];
> > -		for (;;)
> > -			if (xread(client.out, buf, sizeof(buf)) <= 0)
> > -				break;
> > +
> > +	/*
> > +	 * If we encountered an error, we might still get a report. Consume the
> > +	 * rest of the packfile and an extra flush and then we can copy
> > +	 * over the report the same way as in the success case.
> > +	 */
> > +	if (err) {
> > +		int n;
> > +		do {
> > +			n = packet_read(rpc->out, rpc->buf, rpc->alloc, 0);
> > +		} while (n > 0);
> > +
> > +		/* Read the final flush separating the payload from the report */
> > +		packet_read(rpc->out, rpc->buf, rpc->alloc, 0);
> >  	}
> 
> Isn't this existing code already trying to read everything? I think
> rpc->out and client.out are synonyms.

The existing code will read to EOF if we saw an error, ignoring
anything including any reports.

> 
> So now instead of reading to EOF, we are reading some set number of
> packets. This function is used for both fetches and pushes, isn't it? Is
> the expected number of packets the same for both? What about
> stateless-connect mode?
> 
> > +	/* Copy the report of successes/failures */
> > +	strbuf_read(rpc_result, client.out, 0);
> 
> OK, so this is where we read the result. Which again, only makes sense
> for send-pack. And in theory we've synchronized the protocol through the

The existing code already tries to read the report regardless of
pushing or fetch in the non-err case.

> packet reads above (are we sure that we always enter the read loop above
> from a predictable synchronization point in the protocol, given that we
> saw an error?).

That's what the loop is trying to do. It reads the rest of the
packfile, and it's trying to get to the flush at the end. This is what
the while loop above does, that copies between packet_read and
post_rpc. Given that send-pack is still sending the rest of the
packfile, it should be the same as if we had been sending the data over
the network.

It doesn't quite work which is why there's an extra read there but I
take your point that I forgot that we also run fetches through this so
it's probably going to be more complicated anyway.

> 
> What if send-pack doesn't send us anything useful (e.g., it hangs up
> without sending the report). Shouldn't we take the presence of "err"
> being non-zero as an indication that things are not well, even if we
> never get to the send-pack report?

In this case err is non-zero because we got a non-200 HTTP response, if
I followed the code correctly, so it does mean there was an error, and
it's true that we don't necessarily know why with just that variable.

It's still nicer if we can try to get more data out of send-pack (and
fetch-pack if it does have more information there), but yes this code
isn't quite it.

I might just step back on this a bit and split my fixes here as a
change to make send-pack return an error message should just need a
change to still forward non-2xx responses if the server claims it to be
the Git protocol. That still shows the error message from the server
even if we provide the "Everything up-to-date" message as well.

Cheers,
   cmn