Am 10.08.2022 um 21:53 schrieb Jeff King: > On Wed, Aug 10, 2022 at 07:39:34AM +0200, René Scharfe wrote: > >>> So it's weird that you'd see EAGAIN in this instance. Either the >>> underlying write() is refusing to do a partial write (and just returning >>> an error with EAGAIN in the first place), or the poll emulation is wrong >>> (telling us the descriptor is ready for writing when it isn't). >> >> You're right, Windows' write needs two corrections. The helper below >> reports what happens when we feed a pipe with writes of different sizes. >> On Debian on WSL 2 (Windows Subsystem for Linux) it says: >> [...] > > Thanks for digging into this further. What you found makes sense to me > and explains what we're seeing. > >> The two corrections mentioned above together with the enable_nonblock() >> implementation for Windows (and the removal of "false") suffice to let >> t3701 pass when started directly, but it still hangs when running the >> whole test suite using prove. > > Interesting. I wish there was an easy way for me to poke at this, too. I > tried installing the Git for Windows SDK under wine, but unsurprisingly > it did not get very far. > > Possibly I could try connecting to a running CI instance, but the test > did not seem to fail there! (Plus I'd have to figure out how to do > that... ;) ). > >> I don't have time to investigate right now, but I still don't >> understand how xwrite() can possibly work against a non-blocking pipe. >> It loops on EAGAIN, which is bad if the only way forward is to read >> from a different fd to allow the other process to drain the pipe >> buffer so that xwrite() can write again. I suspect pump_io_round() >> must not use xwrite() and should instead handle EAGAIN by skipping to >> the next fd. > > Right, it's susceptible to looping forever in such a case. _But_ a > blocking write is likewise susceptible to blocking forever. In either > case, we're relying on the reading side to pull some bytes out of the > pipe so we can make forward progress. > > The key thing is that pump_io() is careful never to initiate a write() > unless poll() has just told us that the descriptor is ready for writing. Right, and Windows breaks it by refusing to write data bigger than the buffer even if it's empty. What does "ready for writing" mean? PIPE_BUF bytes are free, right? > If something unexpected happens there (i.e., the descriptor is not > really ready), a blocking descriptor is going to be stuck. And with > xwrite(), we're similarly stuck (just looping instead of blocking). > Without xwrite(), a non-blocking one _could_ be better off, because that > EAGAIN would make it up to pump_io(). But what is it supposed to do? I > guess it could go back into its main loop and hope that whatever bug > caused the mismatch between poll() and write() goes away. It should check other fds to let the other side make some progress on them, so that it eventually gets to drain the pipe we want to write to. > But even that would not have fixed the problem here on Windows. From my > understanding, mingw_write() in this case would never write _any_ bytes. > So we'd never make forward progress, and just loop writing 0 bytes and > returning EAGAIN over and over. Right, we need to teach it to break up large writes. It must make at least some progress, if possible. > So I dunno. We could try to be a bit more defensive about non-blocking > descriptors by avoiding xwrite() in this instance, but it only helps for > a particular class of weird OS behavior/bugs. I'd prefer to see a real > case that it would help before moving in that direction. Makes sense. René