On Thu, Dec 5, 2019 at 9:22 AM David Sterba <dsterba@xxxxxxx> wrote: > > I rerun the test again (with a different address where it's stuck), there's > nothing better I can get from the debug info, it always points to pipe_wait, > disassembly points to. Hah. I see another bug. "pipe_wait()" depends on the fact that all events that wake it up happen with the pipe lock held. But we do some of the "do_wakeup()" handling outside the pipe lock now on the reader side __pipe_unlock(pipe); /* Signal writers asynchronously that there is more room. */ if (do_wakeup) { wake_up_interruptible_poll(&pipe->wait, EPOLLOUT | EPOLLWRNORM); kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT); } However, that isn't new to this series _either_, so I don't think that's it. It does wake up things inside the lock _too_ if it ended up emptying a whole buffer. So it could be triggered by timing and behavior changes, but I doubt this pipe_wait() thing is it either. The fact that it bisects to the thing that changes things to use head/tail pointers makes me think there's some other incorrect update or comparison somewhere. That said, "pipe_wait()" is an abomination. It should use a proper wait condition and use wait_event(), but the code predates all of that. I suspect pipe_wait() goes back to the dark ages with the BKL and no actual races between kernel code. Linus