Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 3, 2025 at 10:46 AM Sapkal, Swapnil <swapnil.sapkal@xxxxxxx> wrote:
> But in the meanwhile me and Prateek tried some of the experiments in the weekend.
> We were able to reproduce this issue on a third generation EPYC system as well as
> on an Intel Emerald Rapids (2 X INTEL(R) XEON(R) PLATINUM 8592+).
>
> We tried heavy hammered tracing approach over the weekend on top of your debug patch.
> I have attached the debug patch below. With tracing we found the following case for
> pipe_writable():
>
>    hackbench-118768  [206] .....  1029.550601: pipe_write: 000000005eea28ff: 0: 37 38 16: 1
>
> Here,
>
> head = 37
> tail = 38
> max_usage = 16
> pipe_full() returns 1.
>

AFAICT the benchmark has one reader per fd, but multiple writers.

Maybe I'm misunderstanding something, but for such a case I think this
is expected as a possible transient condition and while not ideal, it
should not lead to the bug at hand.

Suppose there is only one reader and one writer and a wakeup-worthy
condition showed up. Then both sides perform wakeups *after* dropping
the pipe mutex, meaning their state is published before whoever they
intend to wake up gets on CPU. At the same time any new arrivals which
did not sleep start with taking the mutex.

Suppose there are two or more writers (one of which is blocked) and
still one reader and the pipe transitions to no longer full. Before
the woken up writer reaches pipe_writable() the pipe could have
transitioned to any state an arbitrary number of times, but someone
had to observe the correct state. In particular it is legitimate for a
non-sleeping writer to sneak in and fill in the pipe and the reader to
have time to get back empty it again etc.

Or to put it differently, if the patch does correct the bug, it needs
to explain how everyone ends up blocked. Per the above, there always
should be at least one writer and one reader who make progress -- this
somehow breaks (hence the bug), but I don't believe the memory
ordering explains it.

Consequently I think the patch happens to hide the bug, just like the
now removed spurious wakeup used to do.

> Between reading of head and later the tail, the tail seems to have moved ahead of the
> head leading to wraparound. Applying the following changes I have not yet run into a
> hang on the original machine where I first saw it:
>
> diff --git a/fs/pipe.c b/fs/pipe.c
> index ce1af7592780..a1931c817822 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -417,9 +417,19 @@ static inline int is_packetized(struct file *file)
>   /* Done while waiting without holding the pipe lock - thus the READ_ONCE() */
>   static inline bool pipe_writable(const struct pipe_inode_info *pipe)
>   {
> -       unsigned int head = READ_ONCE(pipe->head);
> -       unsigned int tail = READ_ONCE(pipe->tail);
>         unsigned int max_usage = READ_ONCE(pipe->max_usage);
> +       unsigned int head, tail;
> +
> +       tail = READ_ONCE(pipe->tail);
> +       /*
> +        * Since the unsigned arithmetic in this lockless preemptible context
> +        * relies on the fact that the tail can never be ahead of head, read
> +        * the head after the tail to ensure we've not missed any updates to
> +        * the head. Reordering the reads can cause wraparounds and give the
> +        * illusion that the pipe is full.
> +        */
> +       smp_rmb();
> +       head = READ_ONCE(pipe->head);
>
>         return !pipe_full(head, tail, max_usage) ||
>                 !READ_ONCE(pipe->readers);
> ---
>
> smp_rmb() on x86 is a nop and even without the barrier we were not able to
> reproduce the hang even after 10000 iterations.
>
> If you think this is a genuine bug fix, I will send a patch for this.
>


-- 
Mateusz Guzik <mjguzik gmail.com>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux