Re: [PATCH v2 1/5] remote-curl: avoid hang if curl asks for more data after eof

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Wed, 15 Nov 2023 11:20:26 -0800

Jiří Hruška <jirka@xxxxxx> writes:
> It has been observed that under some circumstances, libcurl can call
> our `CURLOPT_READFUNCTION` callback `rpc_out()` again even after
> already getting a return value of zero (EOF) back once before.
> 
> Because `rpc->flush_read_but_not_sent` is reset to false immediately
> the first time an EOF is returned, the repeated call goes again to
> `rpc_read_from_out()`, which tries to read more from the child process
> pipe and the whole operation gets stuck - the child process is already
> trying to read a response back and will not write anything to the
> output pipe anymore, while the parent/remote process is now blocked
> waiting to read more too and never even finishes sending the request.
> 
> The bug is fixed by moving the reset of the `flush_read_but_not_sent`
> flag to `post_rpc()`, only before `rpc_out()` would be potentially used
> the next time. This makes the callback behave like fread() and return
> a zero any number of times at the end of a finished upload.
> 
> Signed-off-by: Jiri Hruska <jirka@xxxxxx>

Thanks for splitting this out - this makes it much easier to review. I
can see that the patch indeed works, but some things need to be clearer
(described below).

In the commit message, I think we should add a historical note,
something like:

	`flush_read_but_not_sent` was introduced in a97d00799a (remote-curl:
	use post_rpc() for protocol v2 also, 2019-02-21), which needed a way to
	indicate that Git should not read from a certain child process anymore
	once "flush" has been received from the child process. This field was
	scoped to what was believed the minimum necessary - the interval between
	"flush" being received and EOF being reported to Curl.

	However, as described at the beginning of the commit message, we need
	the scope of this to be greater than that - in case Curl continues to
	ask us for more data, we still need to remember that "flush" has been
	received. Therefore, change this field from `flush_read_but_not_sent` to
	`flush_read`, which both is simpler and is a fix for this bug.

Feel free to use this verbatim, or adapt it as you wish (or write your
own).

As described in the historical note, I also think we need to change the
name of this field, since we are changing its semantics.

I am trying to be better at indicating when I think a prefix subset of
a patch set is worth merging on its own (so that, if some patches in
a patch set are good and some still need to be discussed, we have the
option of merging a prefix of a patch set). So, assuming my comments are
addressed, I think this patch is good to go in on its own, even if we
don't want some of the later patches. I'll review the rest of this patch
set later.