Re: [PATCH] remote-curl: avoid hang if curl asks for more data after eof

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Mon, 13 Nov 2023 13:22:42 -0800

Jiří Hruška <jirka@xxxxxx> writes:
> It has been observed that under some circumstances, libcurl can call
> our `CURLOPT_READFUNCTION` callback `rpc_out()` again even after
> already getting a return value of zero (EOF) back once before.
> 
> This results in `rpc_read_from_out()` trying to read more from the
> child process pipe, because `rpc->flush_read_but_not_sent` is reset to
> false already the first time EOF is returned, and then the whole
> operation gets stuck - the child process is already trying to read
> a response back and will not write anything to the output pipe anymore,
> while the parent/remote process is now blocked waiting to read more too
> and never even finishes sending the request.

Ah, thanks for finding this bug. It sounds worthwhile to make Git more
resilient in this situation.

I'll just make some preliminary comments.

> This commit addresses the bug with the following changes:

[snip]

This seems like a long list of changes, when from the description of the
bug above, I would have assumed it sufficient to record somewhere when
the CURLOPT_READFUNCTION callback returns zero, and then always return
zero after that if the callback is ever re-invoked. If this is indeed
not sufficient, we should describe why.

Also, if multiple changes are needed, please split them into several
commits.

> A trivial solution would be to just take the line which resets the flag
> 
>     /*
>      * The line length either does not need to be sent at
>      * all or has already been completely sent. Now we can
>      * return 0, indicating EOF, meaning that the flush has
>      * been fully sent.
>      */
> -   rpc->flush_read_but_not_sent = 0;
>     return 0;
> 
> from rpc_out() and reset it only in post_rpc(), before the next time a large
> request is being sent out and rpc_out() will go into play again:
> 
>     if (large_request) {
>         ...
>         rpc->initial_buffer = 1;
> +       rpc->flush_read_but_not_sent = 0;
>         curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
> 
> This way the CURLOPT_READFUNCTION would be returning zeros at the end of the
> upload as long as needed, just like fread() at the end of a real file.
> 
> Hence, the bug could be fixed with just that two-lines change.

Ah, you describe the straightforward fix here. I haven't looked at this
closely enough to see if this would work, though.

> But while trying to figure out the above, I noticed a few things that prolonged
> the time I needed to understand what was going on, so I would like to propose
> some more changes to make the code perhaps a bit easier to read for the next
> person who comes to hack on it after me.
> 
> The description of the extra modifications is in the commit message. All of
> these changes are obviously optional and naturally subjective. I think that we
> can all agree on some points (less indentation = good), but naming is hard,
> and so is balance between unclear and too verbose, or when to split all
> non-functional changes to a separate commit. So let me know if there are things
> to do differently and I will gladly obey, it is your codebase after all.

Yes, please split non-functional changes into a separate commit
(preferably one for each concern). I do envision reviewers saying "let's
put patches X, Y, and Z in, but not patches A, B, and C", so splitting
would make it easier to decide what's worthwhile to have.

> Which brings me to the next topic, testing.
> 
> Validating the fix would be trivial with a mocked libcurl, but turns out to be
> much harder with the integration-level test suite of this project.

[snip]

> But outside of that, and as far as the bundled test suite goes, I have failed to
> write a test that could validate this problem does not ever occur again.

Due to the nature of the bug and the fix, I do agree that it's hard to
test and I would be OK with including the fix without associated tests.