Jiří Hruška <jirka@xxxxxx> writes: > It has been observed that under some circumstances, libcurl can call > our `CURLOPT_READFUNCTION` callback `rpc_out()` again even after > already getting a return value of zero (EOF) back once before. > > This results in `rpc_read_from_out()` trying to read more from the > child process pipe, because `rpc->flush_read_but_not_sent` is reset to > false already the first time EOF is returned, and then the whole > operation gets stuck - the child process is already trying to read > a response back and will not write anything to the output pipe anymore, > while the parent/remote process is now blocked waiting to read more too > and never even finishes sending the request. Ah, thanks for finding this bug. It sounds worthwhile to make Git more resilient in this situation. I'll just make some preliminary comments. > This commit addresses the bug with the following changes: [snip] This seems like a long list of changes, when from the description of the bug above, I would have assumed it sufficient to record somewhere when the CURLOPT_READFUNCTION callback returns zero, and then always return zero after that if the callback is ever re-invoked. If this is indeed not sufficient, we should describe why. Also, if multiple changes are needed, please split them into several commits. > A trivial solution would be to just take the line which resets the flag > > /* > * The line length either does not need to be sent at > * all or has already been completely sent. Now we can > * return 0, indicating EOF, meaning that the flush has > * been fully sent. > */ > - rpc->flush_read_but_not_sent = 0; > return 0; > > from rpc_out() and reset it only in post_rpc(), before the next time a large > request is being sent out and rpc_out() will go into play again: > > if (large_request) { > ... > rpc->initial_buffer = 1; > + rpc->flush_read_but_not_sent = 0; > curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out); > > This way the CURLOPT_READFUNCTION would be returning zeros at the end of the > upload as long as needed, just like fread() at the end of a real file. > > Hence, the bug could be fixed with just that two-lines change. Ah, you describe the straightforward fix here. I haven't looked at this closely enough to see if this would work, though. > But while trying to figure out the above, I noticed a few things that prolonged > the time I needed to understand what was going on, so I would like to propose > some more changes to make the code perhaps a bit easier to read for the next > person who comes to hack on it after me. > > The description of the extra modifications is in the commit message. All of > these changes are obviously optional and naturally subjective. I think that we > can all agree on some points (less indentation = good), but naming is hard, > and so is balance between unclear and too verbose, or when to split all > non-functional changes to a separate commit. So let me know if there are things > to do differently and I will gladly obey, it is your codebase after all. Yes, please split non-functional changes into a separate commit (preferably one for each concern). I do envision reviewers saying "let's put patches X, Y, and Z in, but not patches A, B, and C", so splitting would make it easier to decide what's worthwhile to have. > Which brings me to the next topic, testing. > > Validating the fix would be trivial with a mocked libcurl, but turns out to be > much harder with the integration-level test suite of this project. [snip] > But outside of that, and as far as the bundled test suite goes, I have failed to > write a test that could validate this problem does not ever occur again. Due to the nature of the bug and the fix, I do agree that it's hard to test and I would be OK with including the fix without associated tests.