Re: [PATCH] mingw: handle writes to non-blocking pipe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 11.08.2022 um 20:20 schrieb Jeff King:
> On Thu, Aug 11, 2022 at 07:35:33PM +0200, René Scharfe wrote:
>
>> OK, but we can't just split anything up as we like: POSIX wants us to
>> keep writes up to PIPE_BUF atomic.  When I read that name I mistakenly
>> thought it was the size of the pipe buffer, but it's a different value.
>> The minimum value according to POSIX is 512 bytes; on Linux it's 4KB.
>>
>> And Windows doesn't seem to define it.  Bash's ulimit -p returns 8,
>> which is in units of 512 bytes, so it's 4KB like on Linux.  But that's
>> apparently not respected by Windows' write.
>>
>> I just realized that we can interprete the situation slightly
>> differently.  Windows has effectively PIPE_BUF = 2^32, which means all
>> writes are atomic.  I don't see POSIX specifying a maximum allowed
>> value, so this must be allowed.  Which means you cannot rely on write
>> performing partial writes.  Makes sense?
>
> Hmm, I'm not sure. POSIX does allow writing a single byte if that's all
> the space there is, but only if the original _request_ was for more than
> PIPE_BUF. Which I guess matches what you're saying.
>
> If the original request is smaller than PIPE_BUF, it does seem to say
> that EAGAIN is the correct output. But it also says:
>
>   There is no exception regarding partial writes when O_NONBLOCK is set.
>   With the exception of writing to an empty pipe, this volume of
>   POSIX.1-2017 does not specify exactly when a partial write is
>   performed since that would require specifying internal details of the
>   implementation. Every application should be prepared to handle partial
>   writes when O_NONBLOCK is set and the requested amount is greater than
>   {PIPE_BUF}, just as every application should be prepared to handle
>   partial writes on other kinds of file descriptors.
>
>   The intent of forcing writing at least one byte if any can be written
>   is to assure that each write makes progress if there is any room in
>   the pipe. If the pipe is empty, {PIPE_BUF} bytes must be written; if
>   not, at least some progress must have been made.
>
> So they are clearly aware of the "we need to make forward progress"
> problem. But how are you supposed to do that if you only have less than
> PIPE_BUF bytes left to write? And likewise, even if it is technically
> legal, having a PIPE_BUF of 2^32 seems like a quality-of-implementation
> issue.
>
> And that doesn't really match what poll() is saying. The response from
> poll() told us we could write without blocking, which implies at least
> PIPE_BUF bytes available. But clearly it isn't available, since the
> write fails (with EAGAIN, but that is equivalent to blocking in POSIX's
> view here).

Turns out that's not the case on Windows: 94f4d01932 (mingw: workaround
for hangs when sending STDIN, 2020-02-17) changed the compatibility
implementation to 'Make `poll()` always reply "writable" for write end
of the pipe.'.

An updated version of the test helper confirms it (patch below) by
reporting on Windows:

chunk size: 1 bytes
 EAGAIN after 8192 bytes
chunk size: 500 bytes
 EAGAIN after 8000 bytes
chunk size: 1000 bytes
 EAGAIN after 8000 bytes
chunk size: 1024 bytes
 EAGAIN after 8192 bytes
chunk size: 100000 bytes
 partial write of 8192 bytes after 0 bytes
 EAGAIN after 8192 bytes

On Debian I get this instead:

chunk size: 1 bytes
 POLLOUT gone after 61441 bytes
 EAGAIN after 65536 bytes
chunk size: 500 bytes
 POLLOUT gone after 60500 bytes
 EAGAIN after 64000 bytes
chunk size: 1000 bytes
 POLLOUT gone after 61000 bytes
 EAGAIN after 64000 bytes
chunk size: 1024 bytes
 POLLOUT gone after 62464 bytes
 EAGAIN after 65536 bytes
chunk size: 100000 bytes
 partial write of 65536 bytes after 0 bytes
 POLLOUT gone after 65536 bytes
 EAGAIN after 65536 bytes

So on Windows the POLLOUT bit is indeed never unset.

> Lawyering aside, I think it would be OK to move forward with cutting up
> writes at least to a size of 512 bytes. Git runs on lots of platforms,
> and none of the code can make an assumption that PIPE_BUF is larger than
> that. I.e., we are reducing atomicity provided by Windows, but that's
> OK.
>
> I don't think that solves our problem fully, though. We might need to
> write 5 bytes, and telling mingw_write() to do so means it's supposed to
> abide by PIPE_BUF conventions. But again, we are in control of the
> calling code here. I don't think there's any reason that we care about
> PIPE_BUF atomicity. Certainly we don't get those atomicity guarantees on
> the socket side, which is where much of our writing happens, and our
> code is prepared to handle partial writes of any size. So we could just
> ignore that guarantee here.
>
>> If we accept that, then we need a special write function for
>> non-blocking pipes that chops the data into small enough chunks.
>
> I'm not sure what "small enough" we can rely on, though. Really it is
> the interplay between poll() and write() that we care about here. We
> would like to know at what point poll() will tell us it's OK to write().
> But we don't know what the OS thinks of that.

Based on the output above I think Linux' poll() won't consider a pipe
writable that has less than PIPE_BUF (4096) available bytes.

> Or maybe we do, since I think poll() is our own compat layer. But it
> just seems to be based on select(). We do seem to use PeekNamedPipe()
> there to look on the reading side, but I don't know if there's an
> equivalent for writing.

This has been elusive so far (see 94f4d01932).

Perhaps we should take the advice about PIPE_NOWAIT in the docs serious
and use overlapping (asynchronous) writes on Windows instead.  This
would mean reimplementing the whole pipe_command() with Windows API
commands, I imagine.

>> Another oddity: t3701 with yesterday's patch finishes fine with -i, but
>> hangs without it (not just when running it via prove).  O_o
>
> Yes, definitely strange. I'd expect weirdness with "-v", for example,
> because of terminal stuff, but "-i" should have no impact on the running
> environment.
It's especially grating because test runs also take very looong.

Avoiding xwrite() in pump_io_round() on top lets the test suite
finish successfully.

René


---
 Makefile                 |  1 +
 t/helper/test-nonblock.c | 73 ++++++++++++++++++++++++++++++++++++++++
 t/helper/test-tool.c     |  1 +
 t/helper/test-tool.h     |  1 +
 4 files changed, 76 insertions(+)
 create mode 100644 t/helper/test-nonblock.c

diff --git a/Makefile b/Makefile
index d9c00cc05d..0bc028ca00 100644
--- a/Makefile
+++ b/Makefile
@@ -751,6 +751,7 @@ TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
 TEST_BUILTINS_OBJS += test-match-trees.o
 TEST_BUILTINS_OBJS += test-mergesort.o
 TEST_BUILTINS_OBJS += test-mktemp.o
+TEST_BUILTINS_OBJS += test-nonblock.o
 TEST_BUILTINS_OBJS += test-oid-array.o
 TEST_BUILTINS_OBJS += test-oidmap.o
 TEST_BUILTINS_OBJS += test-oidtree.o
diff --git a/t/helper/test-nonblock.c b/t/helper/test-nonblock.c
new file mode 100644
index 0000000000..c9350ce894
--- /dev/null
+++ b/t/helper/test-nonblock.c
@@ -0,0 +1,73 @@
+#include "test-tool.h"
+#include "compat/nonblock.h"
+
+static void fill_pipe(size_t write_len)
+{
+	void *buf = xcalloc(1, write_len);
+	int fds[2];
+	size_t total_written = 0;
+	int last = 0;
+	struct pollfd pfd;
+	int writable = 1;
+
+	if (pipe(fds))
+		die_errno("pipe failed");
+	if (enable_nonblock(fds[1]))
+		die_errno("enable_nonblock failed");
+	pfd.fd = fds[1];
+	pfd.events = POLLOUT;
+
+	printf("chunk size: %"PRIuMAX" bytes\n", write_len);
+	for (;;) {
+		ssize_t written;
+		if (poll(&pfd, 1, 0) < 0) {
+			if (errno == EINTR)
+				continue;
+			die_errno("poll failed");
+		}
+		if (writable && !(pfd.revents & POLLOUT)) {
+			writable = 0;
+			printf(" POLLOUT gone after %"PRIuMAX" bytes\n",
+			       total_written);
+		}
+		written = write(fds[1], buf, write_len);
+		if (written < 0) {
+			switch (errno) {
+			case EAGAIN:
+				printf(" EAGAIN after %"PRIuMAX" bytes\n",
+				       total_written);
+				break;
+			case ENOSPC:
+				printf(" ENOSPC after %"PRIuMAX" bytes\n",
+				       total_written);
+				break;
+			default:
+				printf(" errno %d after %"PRIuMAX" bytes\n",
+				       errno, total_written);
+			}
+			break;
+		} else if (written != write_len)
+			printf(" partial write of %"PRIuMAX" bytes"
+			       " after %"PRIuMAX" bytes\n",
+			       (uintmax_t)written, total_written);
+		if (last)
+			break;
+		if (written > 0)
+			total_written += written;
+		last = !written;
+	};
+
+	close(fds[1]);
+	close(fds[0]);
+	free(buf);
+}
+
+int cmd__nonblock(int argc, const char **argv)
+{
+	fill_pipe(1);
+	fill_pipe(500);
+	fill_pipe(1000);
+	fill_pipe(1024);
+	fill_pipe(100000);
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 318fdbab0c..562d7a9161 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -45,6 +45,7 @@ static struct test_cmd cmds[] = {
 	{ "match-trees", cmd__match_trees },
 	{ "mergesort", cmd__mergesort },
 	{ "mktemp", cmd__mktemp },
+	{ "nonblock", cmd__nonblock },
 	{ "oid-array", cmd__oid_array },
 	{ "oidmap", cmd__oidmap },
 	{ "oidtree", cmd__oidtree },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index bb79927163..d9006a5298 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -36,6 +36,7 @@ int cmd__lazy_init_name_hash(int argc, const char **argv);
 int cmd__match_trees(int argc, const char **argv);
 int cmd__mergesort(int argc, const char **argv);
 int cmd__mktemp(int argc, const char **argv);
+int cmd__nonblock(int argc, const char **argv);
 int cmd__oidmap(int argc, const char **argv);
 int cmd__oidtree(int argc, const char **argv);
 int cmd__online_cpus(int argc, const char **argv);
--
2.37.1.windows.1




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux