Am 11.08.2022 um 20:20 schrieb Jeff King: > On Thu, Aug 11, 2022 at 07:35:33PM +0200, René Scharfe wrote: > >> OK, but we can't just split anything up as we like: POSIX wants us to >> keep writes up to PIPE_BUF atomic. When I read that name I mistakenly >> thought it was the size of the pipe buffer, but it's a different value. >> The minimum value according to POSIX is 512 bytes; on Linux it's 4KB. >> >> And Windows doesn't seem to define it. Bash's ulimit -p returns 8, >> which is in units of 512 bytes, so it's 4KB like on Linux. But that's >> apparently not respected by Windows' write. >> >> I just realized that we can interprete the situation slightly >> differently. Windows has effectively PIPE_BUF = 2^32, which means all >> writes are atomic. I don't see POSIX specifying a maximum allowed >> value, so this must be allowed. Which means you cannot rely on write >> performing partial writes. Makes sense? > > Hmm, I'm not sure. POSIX does allow writing a single byte if that's all > the space there is, but only if the original _request_ was for more than > PIPE_BUF. Which I guess matches what you're saying. > > If the original request is smaller than PIPE_BUF, it does seem to say > that EAGAIN is the correct output. But it also says: > > There is no exception regarding partial writes when O_NONBLOCK is set. > With the exception of writing to an empty pipe, this volume of > POSIX.1-2017 does not specify exactly when a partial write is > performed since that would require specifying internal details of the > implementation. Every application should be prepared to handle partial > writes when O_NONBLOCK is set and the requested amount is greater than > {PIPE_BUF}, just as every application should be prepared to handle > partial writes on other kinds of file descriptors. > > The intent of forcing writing at least one byte if any can be written > is to assure that each write makes progress if there is any room in > the pipe. If the pipe is empty, {PIPE_BUF} bytes must be written; if > not, at least some progress must have been made. > > So they are clearly aware of the "we need to make forward progress" > problem. But how are you supposed to do that if you only have less than > PIPE_BUF bytes left to write? And likewise, even if it is technically > legal, having a PIPE_BUF of 2^32 seems like a quality-of-implementation > issue. > > And that doesn't really match what poll() is saying. The response from > poll() told us we could write without blocking, which implies at least > PIPE_BUF bytes available. But clearly it isn't available, since the > write fails (with EAGAIN, but that is equivalent to blocking in POSIX's > view here). Turns out that's not the case on Windows: 94f4d01932 (mingw: workaround for hangs when sending STDIN, 2020-02-17) changed the compatibility implementation to 'Make `poll()` always reply "writable" for write end of the pipe.'. An updated version of the test helper confirms it (patch below) by reporting on Windows: chunk size: 1 bytes EAGAIN after 8192 bytes chunk size: 500 bytes EAGAIN after 8000 bytes chunk size: 1000 bytes EAGAIN after 8000 bytes chunk size: 1024 bytes EAGAIN after 8192 bytes chunk size: 100000 bytes partial write of 8192 bytes after 0 bytes EAGAIN after 8192 bytes On Debian I get this instead: chunk size: 1 bytes POLLOUT gone after 61441 bytes EAGAIN after 65536 bytes chunk size: 500 bytes POLLOUT gone after 60500 bytes EAGAIN after 64000 bytes chunk size: 1000 bytes POLLOUT gone after 61000 bytes EAGAIN after 64000 bytes chunk size: 1024 bytes POLLOUT gone after 62464 bytes EAGAIN after 65536 bytes chunk size: 100000 bytes partial write of 65536 bytes after 0 bytes POLLOUT gone after 65536 bytes EAGAIN after 65536 bytes So on Windows the POLLOUT bit is indeed never unset. > Lawyering aside, I think it would be OK to move forward with cutting up > writes at least to a size of 512 bytes. Git runs on lots of platforms, > and none of the code can make an assumption that PIPE_BUF is larger than > that. I.e., we are reducing atomicity provided by Windows, but that's > OK. > > I don't think that solves our problem fully, though. We might need to > write 5 bytes, and telling mingw_write() to do so means it's supposed to > abide by PIPE_BUF conventions. But again, we are in control of the > calling code here. I don't think there's any reason that we care about > PIPE_BUF atomicity. Certainly we don't get those atomicity guarantees on > the socket side, which is where much of our writing happens, and our > code is prepared to handle partial writes of any size. So we could just > ignore that guarantee here. > >> If we accept that, then we need a special write function for >> non-blocking pipes that chops the data into small enough chunks. > > I'm not sure what "small enough" we can rely on, though. Really it is > the interplay between poll() and write() that we care about here. We > would like to know at what point poll() will tell us it's OK to write(). > But we don't know what the OS thinks of that. Based on the output above I think Linux' poll() won't consider a pipe writable that has less than PIPE_BUF (4096) available bytes. > Or maybe we do, since I think poll() is our own compat layer. But it > just seems to be based on select(). We do seem to use PeekNamedPipe() > there to look on the reading side, but I don't know if there's an > equivalent for writing. This has been elusive so far (see 94f4d01932). Perhaps we should take the advice about PIPE_NOWAIT in the docs serious and use overlapping (asynchronous) writes on Windows instead. This would mean reimplementing the whole pipe_command() with Windows API commands, I imagine. >> Another oddity: t3701 with yesterday's patch finishes fine with -i, but >> hangs without it (not just when running it via prove). O_o > > Yes, definitely strange. I'd expect weirdness with "-v", for example, > because of terminal stuff, but "-i" should have no impact on the running > environment. It's especially grating because test runs also take very looong. Avoiding xwrite() in pump_io_round() on top lets the test suite finish successfully. René --- Makefile | 1 + t/helper/test-nonblock.c | 73 ++++++++++++++++++++++++++++++++++++++++ t/helper/test-tool.c | 1 + t/helper/test-tool.h | 1 + 4 files changed, 76 insertions(+) create mode 100644 t/helper/test-nonblock.c diff --git a/Makefile b/Makefile index d9c00cc05d..0bc028ca00 100644 --- a/Makefile +++ b/Makefile @@ -751,6 +751,7 @@ TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o TEST_BUILTINS_OBJS += test-match-trees.o TEST_BUILTINS_OBJS += test-mergesort.o TEST_BUILTINS_OBJS += test-mktemp.o +TEST_BUILTINS_OBJS += test-nonblock.o TEST_BUILTINS_OBJS += test-oid-array.o TEST_BUILTINS_OBJS += test-oidmap.o TEST_BUILTINS_OBJS += test-oidtree.o diff --git a/t/helper/test-nonblock.c b/t/helper/test-nonblock.c new file mode 100644 index 0000000000..c9350ce894 --- /dev/null +++ b/t/helper/test-nonblock.c @@ -0,0 +1,73 @@ +#include "test-tool.h" +#include "compat/nonblock.h" + +static void fill_pipe(size_t write_len) +{ + void *buf = xcalloc(1, write_len); + int fds[2]; + size_t total_written = 0; + int last = 0; + struct pollfd pfd; + int writable = 1; + + if (pipe(fds)) + die_errno("pipe failed"); + if (enable_nonblock(fds[1])) + die_errno("enable_nonblock failed"); + pfd.fd = fds[1]; + pfd.events = POLLOUT; + + printf("chunk size: %"PRIuMAX" bytes\n", write_len); + for (;;) { + ssize_t written; + if (poll(&pfd, 1, 0) < 0) { + if (errno == EINTR) + continue; + die_errno("poll failed"); + } + if (writable && !(pfd.revents & POLLOUT)) { + writable = 0; + printf(" POLLOUT gone after %"PRIuMAX" bytes\n", + total_written); + } + written = write(fds[1], buf, write_len); + if (written < 0) { + switch (errno) { + case EAGAIN: + printf(" EAGAIN after %"PRIuMAX" bytes\n", + total_written); + break; + case ENOSPC: + printf(" ENOSPC after %"PRIuMAX" bytes\n", + total_written); + break; + default: + printf(" errno %d after %"PRIuMAX" bytes\n", + errno, total_written); + } + break; + } else if (written != write_len) + printf(" partial write of %"PRIuMAX" bytes" + " after %"PRIuMAX" bytes\n", + (uintmax_t)written, total_written); + if (last) + break; + if (written > 0) + total_written += written; + last = !written; + }; + + close(fds[1]); + close(fds[0]); + free(buf); +} + +int cmd__nonblock(int argc, const char **argv) +{ + fill_pipe(1); + fill_pipe(500); + fill_pipe(1000); + fill_pipe(1024); + fill_pipe(100000); + return 0; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 318fdbab0c..562d7a9161 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -45,6 +45,7 @@ static struct test_cmd cmds[] = { { "match-trees", cmd__match_trees }, { "mergesort", cmd__mergesort }, { "mktemp", cmd__mktemp }, + { "nonblock", cmd__nonblock }, { "oid-array", cmd__oid_array }, { "oidmap", cmd__oidmap }, { "oidtree", cmd__oidtree }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index bb79927163..d9006a5298 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -36,6 +36,7 @@ int cmd__lazy_init_name_hash(int argc, const char **argv); int cmd__match_trees(int argc, const char **argv); int cmd__mergesort(int argc, const char **argv); int cmd__mktemp(int argc, const char **argv); +int cmd__nonblock(int argc, const char **argv); int cmd__oidmap(int argc, const char **argv); int cmd__oidtree(int argc, const char **argv); int cmd__online_cpus(int argc, const char **argv); -- 2.37.1.windows.1