Re: [PATCH] pipe_read: don't wake up the writer if the pipe is still full

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 3 Mar 2025 15:16:34 +0530 "Sapkal, Swapnil" <swapnil.sapkal@xxxxxxx>
> On 2/28/2025 10:03 PM, Oleg Nesterov wrote:
> > And... I know, I know you already hate me ;)
> > 
> 
> Not at all :)
> 
> > but if you have time, could you check if this patch (with or without the
> > previous debugging patch) makes any difference? Just to be sure.
> > 
> 
> Sure, I will give this a try.
> 
> But in the meanwhile me and Prateek tried some of the experiments in the weekend.
> We were able to reproduce this issue on a third generation EPYC system as well as
> on an Intel Emerald Rapids (2 X INTEL(R) XEON(R) PLATINUM 8592+).
> 
> We tried heavy hammered tracing approach over the weekend on top of your debug patch.
> I have attached the debug patch below. With tracing we found the following case for
> pipe_writable():
> 
>    hackbench-118768  [206] .....  1029.550601: pipe_write: 000000005eea28ff: 0: 37 38 16: 1
> 
> Here,
> 
> head = 37
> tail = 38
> max_usage = 16
> pipe_full() returns 1.
> 
> Between reading of head and later the tail, the tail seems to have moved ahead of the
> head leading to wraparound. Applying the following changes I have not yet run into a
> hang on the original machine where I first saw it:
> 
> diff --git a/fs/pipe.c b/fs/pipe.c
> index ce1af7592780..a1931c817822 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -417,9 +417,19 @@ static inline int is_packetized(struct file *file)
>   /* Done while waiting without holding the pipe lock - thus the READ_ONCE() */
>   static inline bool pipe_writable(const struct pipe_inode_info *pipe)
>   {
> -	unsigned int head = READ_ONCE(pipe->head);
> -	unsigned int tail = READ_ONCE(pipe->tail);
>   	unsigned int max_usage = READ_ONCE(pipe->max_usage);
> +	unsigned int head, tail;
> +
> +	tail = READ_ONCE(pipe->tail);
> +	/*
> +	 * Since the unsigned arithmetic in this lockless preemptible context
> +	 * relies on the fact that the tail can never be ahead of head, read
> +	 * the head after the tail to ensure we've not missed any updates to
> +	 * the head. Reordering the reads can cause wraparounds and give the
> +	 * illusion that the pipe is full.
> +	 */
> +	smp_rmb();
> +	head = READ_ONCE(pipe->head);
>   
>   	return !pipe_full(head, tail, max_usage) ||
>   		!READ_ONCE(pipe->readers);
> ---
> 
> smp_rmb() on x86 is a nop and even without the barrier we were not able to
> reproduce the hang even after 10000 iterations.
>
My $.02 that changes the wait condition.
Not sure it makes sense for you.

--- x/fs/pipe.c
+++ y/fs/pipe.c
@@ -430,7 +430,7 @@ pipe_write(struct kiocb *iocb, struct io
 {
 	struct file *filp = iocb->ki_filp;
 	struct pipe_inode_info *pipe = filp->private_data;
-	unsigned int head;
+	unsigned int head, tail;
 	ssize_t ret = 0;
 	size_t total_len = iov_iter_count(from);
 	ssize_t chars;
@@ -573,11 +573,13 @@ pipe_write(struct kiocb *iocb, struct io
 		 * after waiting we need to re-check whether the pipe
 		 * become empty while we dropped the lock.
 		 */
+		tail = pipe->tail;
 		mutex_unlock(&pipe->mutex);
 		if (was_empty)
 			wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
 		kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
-		wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
+		wait_event_interruptible_exclusive(pipe->wr_wait,
+				!READ_ONCE(pipe->readers) || tail != READ_ONCE(pipe->tail));
 		mutex_lock(&pipe->mutex);
 		was_empty = pipe_empty(pipe->head, pipe->tail);
 		wake_next_writer = true;
--




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux