Hi, An issue "Jobserver hangs due to full pipe" was recently reported against Cargo, the Rust package manager. This was diagnosed as an issue with pipe writes hanging in certain circumstances. Specifically, if two or more threads simultaneously write to a pipe, it is possible for all the writers to hang despite there being significant space available in the pipe. I have translated the Rust example to C with some small adjustments: #define _GNU_SOURCE #include <fcntl.h> #include <pthread.h> #include <stdio.h> #include <unistd.h> static int pipefd[2]; void *thread_start(void *arg) { char buf[1]; for (int i = 0; i < 1000000; i++) { read(pipefd[0], buf, sizeof(buf)); write(pipefd[1], buf, sizeof(buf)); } puts("done"); return NULL; } int main() { pipe(pipefd); printf("init buffer: %d\n", fcntl(pipefd[1], F_GETPIPE_SZ)); printf("new buffer: %d\n", fcntl(pipefd[1], F_SETPIPE_SZ, 0)); write(pipefd[1], "aa", 2); pthread_t thread1, thread2; pthread_create(&thread1, NULL, thread_start, NULL); pthread_create(&thread2, NULL, thread_start, NULL); pthread_join(thread1, NULL); pthread_join(thread2, NULL); } The expected behavior of this program is to print: init buffer: 65536 new buffer: 4096 done done and then exit. On Linux 5.14-rc4, compiling this program and running it will print the following about half the time: init buffer: 65536 new buffer: 4096 done and then hang. This is unexpected behavior, since the pipe is at most two bytes full at any given time. /proc/x/stack shows that the remaining thread is hanging at pipe.c:560. It looks like not only there needs to be space in the pipe, but also slots. At pipe.c:1306, a one-page pipe has only one slot. this led me to test nthreads=2, which also hangs. Checking blame of the pipe_write comment, it was added in a194dfe, which says, among other things: > We just abandon the preallocated slot if we get a copy error. Future > writes may continue it and a future read will eventually recycle it. This matches the observed behavior: in this case, there are no readers on the pipe, so the abandoned slot is lost. In my opinion (as expressed on the issue), the pipe is being misused here. As explained in the pipe(7) manual page: > Applications should not rely on a particular capacity: an application > should be designed so that a reading process consumes data as soon as > it is available, so that a writing process does not remain blocked. Despite the misuse, I am reporting this for the following reasons: 1. I am reasonably confident that this is a regression in the kernel, which has a standard of making reasonable efforts to maintain backwards compatibility even with broken programs. 2. Even if this is not a regression, it seems like this situation could be handled somewhat more gracefully. In this case, we are not writing 4095 bytes and then expecting a one-byte write to succeed; the pipe is actually almost entirely empty. 3. Pipe sizes dynamically shrink in Linux, so despite the fact that this case is unlikely to occur with two or more slots available, even a program which does not explicitly allocate a one-page pipe buffer may wind up with one if the user has 1024 or more pipes already open. This significantly exacerbates the next point: 4. GNU make's jobserver uses pipes in a similar manner. By my reading of the paper, it is theoretically possible for an N simultaneous writes to occur without any readers, where N is the maximum concurrent jobs permitted. Consider the following example with make -j2: two compile jobs are to be performed: one at the top level, and one in a sub-directory. The top-level make invokes one make and one cc, costing two tokens. The sub-make invokes one cc with its free token. The pipe is now empty. Now, suppose the two compilers return at exactly the same time. Both copies of make will attempt to simultaneously write a token to the pipe. This does not yet trigger deadlock: at least one write will always succeed on an empty pipe. Suppose the sub-make's write goes through. It then exits. The top-level make, however, is still blocked on its original write, since it was not successfully merged with the other write. The build is now deadlocked. I think this does not happen only by a coincidental design decision: when the sub-make exits, the top-level make receives a SIGCHLD. GNU make registers a SA_RESTART handler for SIGCHLD, so the write will be interrupted and restarted. This is only a coincidence, however: the program does not actually expect writing to the control pipe to ever block; it could just as well de-register the signal handler while performing the write and still be fully correct. Regards, Alex.