Re: wakeup_pipe_readers/writers() && pipe_poll()

Oleg Nesterov <oleg@xxxxxxxxxx> · Tue, 7 Jan 2025 18:25:12 +0100

On 01/06, Linus Torvalds wrote:
>
> On Mon, 6 Jan 2025 at 11:34, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >
> > 1. pipe_read() says
> >
> >          * But when we do wake up writers, we do so using a sync wakeup
> >          * (WF_SYNC), because we want them to get going and generate more
> >          * data for us.
> >
> > OK, WF_SYNC makes sense if pipe_read() or pipe_write() is going to do wait_event()
> > after wake_up(). But wake_up_interruptible_sync_poll() looks at bit misleading if
> > we are going to wakeup the writer or next_reader before return.
>
> This heuristic has always been a bit iffy. And honestly, I think it's
> been driven by benchmarks that aren't necessarily always realistic (ie
> for ping-pong benchmarks, the best behavior is often to stay on the
> same CPU and just schedule between the reader/writer).

Agreed. But my question was not about performance, I just tried to
understand this logic. So in the case of

	wake_up_interruptible_sync_poll(wr_wait);
	wait_event_interruptible_exclusive(wr_read);

WF_SYNC is understandable, "stay on the same CPU" looks like the right
thing, and "_sync_" matches the comment above.

But if we are going to return, wake_up_interruptible_sync_poll() looks
a bit misleading to me.

> > 2. I can't understand this code in pipe_write()
> >
> >         if (ret > 0 && sb_start_write_trylock(file_inode(filp)->i_sb)) {
> >                 int err = file_update_time(filp);
> >                 if (err)
> >                         ret = err;
> >                 sb_end_write(file_inode(filp)->i_sb);
> >         }
> >
> >         - it only makes sense in the "fifo" case, right? When
> >           i_sb->s_magic != PIPEFS_MAGIC...
>
> I think we've done it for regular pipes too. You can see it with
> 'fstat()', after all.

Ah, indeed, thanks for correcting me...

And thanks for your other explanations. Again, it is not that I thought
this needs changes, just I was a bit confused. In particular by

	err = file_update_time();
	if (err)
		ret = err;

which doesn't match the usage of file_accessed() in pipe_read().

Oleg.