Am 07.08.22 um 19:41 schrieb Jeff King: > On Sun, Aug 07, 2022 at 12:15:06PM +0200, René Scharfe wrote: > >>> This adds "error: pumping io failed: No space left on device" to output. >>> Which kinda makes sense: With the pipe no longer blocking, there can be >>> a moment when the buffer is full and writes have to be rejected. This >>> condition should be reported with EAGAIN, though. >>> >>> Adding "if (len < 0 && errno == ENOSPC) continue;" after the xwrite() >>> call in pump_io_round() lets the test pass. >>> >>> Perhaps the translation from Windows error code to POSIX is wrong here? >> >> So if we fix that with the patch below, t3701.57 still hangs, but this >> time it goes through wrapper.c::handle_nonblock() again and again. >> Replacing the "errno = EAGAIN" with a "return 0" to fake report a >> successful write of nothing instead lets the test pass. >> >> This seems to make sense -- looping in xwrite() won't help, as we need >> to read from the other fd first, to allow the process on the other end >> of the pipe to make some progress first, as otherwise the pipe buffer >> will stay full in this scenario. Shouldn't that be a problem on other >> systems as well? > > It doesn't happen on Linux; I suspect there's something funny either > about partial writes, or about poll() on Windows. What's supposed to > happen is: > > 1. pump_io() calls poll(), which tells us the descriptor is ready to > write > > 2. we call xwrite(), and our actual write() call returns a partial > write (i.e., reports "ret" bytes < "len" we passed in) > > 3. we return back to pump_io() do another round of poll(). If the > other side consumed some bytes from the pipe, then we may get > triggered to do another (possibly partial) write. If it didn't, and > we'd get EAGAIN writing, then poll shouldn't trigger at all! > > So it's weird that you'd see EAGAIN in this instance. Either the > underlying write() is refusing to do a partial write (and just returning > an error with EAGAIN in the first place), or the poll emulation is wrong > (telling us the descriptor is ready for writing when it isn't). You're right, Windows' write needs two corrections. The helper below reports what happens when we feed a pipe with writes of different sizes. On Debian on WSL 2 (Windows Subsystem for Linux) it says: chunk size: 1 bytes 65536 total bytes written, then got EAGAIN chunk size: 1000 bytes 64000 total bytes written, then got EAGAIN chunk size: 1024 bytes 65536 total bytes written, then got EAGAIN chunk size: 100000 bytes 0 total bytes written, then got a partial write of 65536 bytes 65536 total bytes written, then got EAGAIN On Windows directly I get: chunk size: 1 bytes 8192 total bytes written, then got ENOSPC chunk size: 1000 bytes 8000 total bytes written, then got ENOSPC chunk size: 1024 bytes 8192 total bytes written, then got ENOSPC chunk size: 100000 bytes 0 total bytes written, then got ENOSPC https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html documents what we should get: Writes smaller than the buffer should be atomic, bigger writes bigger should be broken up, and the error code for a full buffer should be EAGAIN. I.e. the first example is right. So mingw_write() needs to translate ENOSPC to EAGAIN and break up huge writes instead of giving up outright. > Can you instrument pump_io_round() (or use some strace equivalent, if > there is one) to see if we do a successful partial write first (which > implies poll() is wrong in telling us we can write more for the second > round), or if the very first write() is failing (which implies write() > is wrong for returning EAGAIN when it could do a partial write). The two corrections mentioned above together with the enable_nonblock() implementation for Windows (and the removal of "false") suffice to let t3701 pass when started directly, but it still hangs when running the whole test suite using prove. I don't have time to investigate right now, but I still don't understand how xwrite() can possibly work against a non-blocking pipe. It loops on EAGAIN, which is bad if the only way forward is to read from a different fd to allow the other process to drain the pipe buffer so that xwrite() can write again. I suspect pump_io_round() must not use xwrite() and should instead handle EAGAIN by skipping to the next fd. René --- Makefile | 1 + t/helper/test-nonblock.c | 51 ++++++++++++++++++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + 4 files changed, 54 insertions(+) create mode 100644 t/helper/test-nonblock.c diff --git a/Makefile b/Makefile index d9c00cc05d..0bc028ca00 100644 --- a/Makefile +++ b/Makefile @@ -751,6 +751,7 @@ TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o TEST_BUILTINS_OBJS += test-match-trees.o TEST_BUILTINS_OBJS += test-mergesort.o TEST_BUILTINS_OBJS += test-mktemp.o +TEST_BUILTINS_OBJS += test-nonblock.o TEST_BUILTINS_OBJS += test-oid-array.o TEST_BUILTINS_OBJS += test-oidmap.o TEST_BUILTINS_OBJS += test-oidtree.o diff --git a/t/helper/test-nonblock.c b/t/helper/test-nonblock.c new file mode 100644 index 0000000000..c9288ea6ac --- /dev/null +++ b/t/helper/test-nonblock.c @@ -0,0 +1,51 @@ +#include "test-tool.h" +#include "compat/nonblock.h" + +static void fill_pipe(size_t write_len) +{ + void *buf = xcalloc(1, write_len); + int fds[2]; + size_t total_written = 0; + int last = 0; + + if (pipe(fds)) + die("pipe failed"); + if (enable_nonblock(fds[1])) + die("enable_nonblock failed"); + + printf("chunk size: %"PRIuMAX" bytes\n", write_len); + for (;;) { + ssize_t written = write(fds[1], buf, write_len); + if (written != write_len) + printf("%"PRIuMAX" total bytes written, then got ", + (uintmax_t)total_written); + if (written < 0) { + switch (errno) { + case EAGAIN: printf("EAGAIN\n"); break; + case ENOSPC: printf("ENOSPC\n"); break; + default: printf("errno %d\n", errno); + } + break; + } else if (written != write_len) + printf("a partial write of %"PRIuMAX" bytes\n", + (uintmax_t)written); + if (last) + break; + if (written > 0) + total_written += written; + last = !written; + }; + + close(fds[1]); + close(fds[0]); + free(buf); +} + +int cmd__nonblock(int argc, const char **argv) +{ + fill_pipe(1); + fill_pipe(1000); + fill_pipe(1024); + fill_pipe(100000); + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 318fdbab0c..562d7a9161 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -45,6 +45,7 @@ static struct test_cmd cmds[] = { { "match-trees", cmd__match_trees }, { "mergesort", cmd__mergesort }, { "mktemp", cmd__mktemp }, + { "nonblock", cmd__nonblock }, { "oid-array", cmd__oid_array }, { "oidmap", cmd__oidmap }, { "oidtree", cmd__oidtree }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index bb79927163..d9006a5298 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -36,6 +36,7 @@ int cmd__lazy_init_name_hash(int argc, const char **argv); int cmd__match_trees(int argc, const char **argv); int cmd__mergesort(int argc, const char **argv); int cmd__mktemp(int argc, const char **argv); +int cmd__nonblock(int argc, const char **argv); int cmd__oidmap(int argc, const char **argv); int cmd__oidtree(int argc, const char **argv); int cmd__online_cpus(int argc, const char **argv); -- 2.37.1