I hit a hang with fstest btrfs/187, which does a btrfs send into /dev/null. This works by creating a pipe, the write side is given to the kernel to write into, and the read side is handed to a thread that splices into a file, in this case /dev/null. The box that was hung had the write side stuck here [<0>] pipe_write+0x38b/0x600 [<0>] new_sync_write+0x108/0x180 [<0>] __kernel_write+0xd4/0x160 [<0>] kernel_write+0x74/0x110 [<0>] btrfs_ioctl_send+0xb51/0x1867 [<0>] _btrfs_ioctl_send+0xbf/0x100 [<0>] btrfs_ioctl+0x1d4c/0x3090 [<0>] __x64_sys_ioctl+0x83/0xb0 [<0>] do_syscall_64+0x33/0x40 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 and the read side stuck here [<0>] pipe_wait+0xa4/0x100 [<0>] splice_from_pipe_next.part.0+0x33/0xe0 [<0>] __splice_from_pipe+0x6a/0x200 [<0>] splice_from_pipe+0x50/0x70 [<0>] do_splice+0x35c/0x7e0 [<0>] __x64_sys_splice+0x92/0x110 [<0>] do_syscall_64+0x33/0x40 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Using drgn to debug, I found that the pipe rd_wait was empty, but the wr_wait was not >>> pipe.rd_wait.head (struct list_head){ .next = (struct list_head *)0xffff9d2e5a3ad8d0, .prev = (struct list_head *)0xffff9d2e5a3ad8d0, } >>> pipe.wr_wait.head (struct list_head){ .next = (struct list_head *)0xffffa8934111bd88, .prev = (struct list_head *)0xffffa89341143ba8, } >>> for e in list_for_each_entry('struct wait_queue_entry', pipe.wr_wait.head.address_of_(), 'entry'): ... task = Object(prog, 'struct task_struct', address=e.private.value_()) ... print("pid {} state {}".format(task.pid, task.state)) ... pid (pid_t)3080640 state (volatile long)1 pid (pid_t)3080638 state (volatile long)1 >>> for e in list_for_each_entry('struct wait_queue_entry', pipe.rd_wait.head.address_of_(), 'entry'): ... task = Object(prog, 'struct task_struct', address=e.private.value_()) ... print("pid {} state {}".format(task.pid, task.state)) ... The wr_wait has both the writer and the reader waiting, which is expected, the pipe is full >>> pipe.head (unsigned int)179612 >>> pipe.tail (unsigned int)179596 >>> pipe.max_usage (unsigned int)16 and the read side is only waiting on wr_wait, rd_wait doesn't have our entry any more. This can happen in the following way WRITER READER pipe_write() splice pipe_lock pipe_lock() was_empty = true do the write break out of the loop pipe_unlock consume what was written pipe_wait() prepare_to_wait(rd_wait) set_task_state(INTERRUPTIBLE) wake_up(rd_wait) set_task_state(READER, RUNNING) prepare_to_wait(wr_wait) set_task_state(INTERRUPTIBLE) pip_unlock() schedule() The problem is we're doing the prepare_to_wait, which sets our state each time, however we can be woken up either with reads or writes. In the case above we race with the WRITER waking us up, and re-set our state to INTERRUPTIBLE, and thus never break out of schedule. Instead we need to set our state once, and then add ourselves to our respective waitqueues. This way if any of our waitqueues wake us up, we'll have TASK_RUNNING set when we enter schedule() and will go right back to check for whatever it is we're waiting on. I tested this patch with the test that hung, but this only happened on one vm last night, and these tests run twice on 3 vm's every night and this is the first time I hit the problem, so it's a rare occurrence. Fixes: 0ddad21d3e99 ("pipe: use exclusive waits when reading or writing") Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> --- fs/pipe.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 60dbee457143..8803a11cbc1b 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -116,8 +116,9 @@ void pipe_wait(struct pipe_inode_info *pipe) * Pipes are system-local resources, so sleeping on them * is considered a noninteractive wait: */ - prepare_to_wait(&pipe->rd_wait, &rdwait, TASK_INTERRUPTIBLE); - prepare_to_wait(&pipe->wr_wait, &wrwait, TASK_INTERRUPTIBLE); + set_current_state(TASK_INTERRUPTIBLE); + add_wait_queue(&pipe->rd_wait, &rdwait); + add_wait_queue(&pipe->wr_wait, &wrwait); pipe_unlock(pipe); schedule(); finish_wait(&pipe->rd_wait, &rdwait); -- 2.26.2