On Thu, Feb 27, 2025 at 1:51 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > Hmm... > > Suppose that pipe is full, a writer W tries to write a single byte > and sleeps on pipe->wr_wait. > > A reader reads PAGE_SIZE bytes, updates pipe->tail, and wakes W up. > > But, before the woken W takes pipe->mutex, another writer comes and > writes 1 byte. This updates ->head and makes pipe_full() true again. > > Now, W could happily merge its "small" write into the last buffer, > but it will sleep again, despite the fact the last buffer has room > for 4095 bytes. > > Sapkal, I don't think this can explain the hang, receiver()->read() > should wake this writer later anyway. But could you please retest > with the patch below? > > Thanks, > > Oleg. > --- > > diff --git a/fs/pipe.c b/fs/pipe.c > index b0641f75b1ba..222881559c30 100644 > --- a/fs/pipe.c > +++ b/fs/pipe.c > @@ -455,6 +455,7 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from) > * page-aligns the rest of the writes for large writes > * spanning multiple pages. > */ > +again: > head = pipe->head; > was_empty = pipe_empty(head, pipe->tail); > chars = total_len & (PAGE_SIZE-1); > @@ -559,8 +560,8 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from) > kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); > wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe)); > mutex_lock(&pipe->mutex); > - was_empty = pipe_empty(pipe->head, pipe->tail); > wake_next_writer = true; > + goto again; > } > out: > if (pipe_full(pipe->head, pipe->tail, pipe->max_usage)) > I think this is buggy. You get wakeups also when the last reader goes away. The for loop you are jumping out of makes sure to check for the condition, same for the first mutex acquire. With this goto you can get a successful write instead of getting SIGPIPE. iow this should goto few lines higher. I am not sure about the return value. The for loop bumps ret with each write, but the section you are jumping to overwrites it. So if the thread wrote some data within the loop, went to sleep and woke up to a state where it can do a write in the section you are jumping to, it is going to return the wrong number of bytes. Unless I'm misreading something. However, I do think something may be going on with the "split" ops, which is why I suggested going from 100 bytes where the bug was encountered to 128 for testing purposes. If that cleared it, that would be nice for sure. :> -- Mateusz Guzik <mjguzik gmail.com>