Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 27, 2025 at 1:51 PM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> Hmm...
>
> Suppose that pipe is full, a writer W tries to write a single byte
> and sleeps on pipe->wr_wait.
>
> A reader reads PAGE_SIZE bytes, updates pipe->tail, and wakes W up.
>
> But, before the woken W takes pipe->mutex, another writer comes and
> writes 1 byte. This updates ->head and makes pipe_full() true again.
>
> Now, W could happily merge its "small" write into the last buffer,
> but it will sleep again, despite the fact the last buffer has room
> for 4095 bytes.
>
> Sapkal, I don't think this can explain the hang, receiver()->read()
> should wake this writer later anyway. But could you please retest
> with the patch below?
>
> Thanks,
>
> Oleg.
> ---
>
> diff --git a/fs/pipe.c b/fs/pipe.c
> index b0641f75b1ba..222881559c30 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -455,6 +455,7 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from)
>          * page-aligns the rest of the writes for large writes
>          * spanning multiple pages.
>          */
> +again:
>         head = pipe->head;
>         was_empty = pipe_empty(head, pipe->tail);
>         chars = total_len & (PAGE_SIZE-1);
> @@ -559,8 +560,8 @@ anon_pipe_write(struct kiocb *iocb, struct iov_iter *from)
>                 kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>                 wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
>                 mutex_lock(&pipe->mutex);
> -               was_empty = pipe_empty(pipe->head, pipe->tail);
>                 wake_next_writer = true;
> +               goto again;
>         }
>  out:
>         if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
>

I think this is buggy.

You get wakeups also when the last reader goes away. The for loop you
are jumping out of makes sure to check for the condition, same for the
first mutex acquire. With this goto you can get a successful write
instead of getting SIGPIPE. iow this should goto few lines higher.

I am not sure about the return value. The for loop bumps ret with each
write, but the section you are jumping to overwrites it. So if the
thread wrote some data within the loop, went to sleep and woke up to a
state where it can do a write in the section you are jumping to, it is
going to return the wrong number of bytes.

Unless I'm misreading something.

However, I do think something may be going on with the "split" ops,
which is why I suggested going from 100 bytes where the bug was
encountered to 128 for testing purposes. If that cleared it, that
would be nice for sure. :>

-- 
Mateusz Guzik <mjguzik gmail.com>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux