Re: [RESEND PATCH] fs/pipe: Introduce a check to skip sleeping processes during pipe read/write

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Wed, 25 Dec 2024 11:32:24 -0500

On Wed, Dec 25, 2024 at 05:04:46PM +0100, Mateusz Guzik wrote:
> On Wed, Dec 25, 2024 at 08:53:05AM -0500, Kent Overstreet wrote:
> > On Wed, Dec 25, 2024 at 03:30:05PM +0200, Andy Shevchenko wrote:
> > > Don't you think the Cc list is a bit overloaded?
> > 
> > Indeed, my mail server doesn't let me reply-all.
> > 
> > > On Wed, Dec 25, 2024 at 05:42:02PM +0800, WangYuli wrote:
> > > > +config PIPE_SKIP_SLEEPER
> > > > +	bool "Skip sleeping processes during pipe read/write"
> > > > +	default n
> > > 
> > > 'n' is the default 'default', no need to have this line.
> > 
> > Actually, I'd say to skip the kconfig option for this. Kconfig options
> > that affect the behaviour of core code increase our testing burden, and
> > are another variable to account for when chasing down bugs, and the
> > potential overhead looks negligable.
> > 
> 
> I agree the behavior should not be guarded by an option. However,
> because of how wq_has_sleeper is implemented (see below) I would argue
> this needs to show how often locking can be avoided in real workloads.
> 
> The commit message does state this comes with a slowdown for cases which
> can't avoid wakeups, but as is I thought the submitter just meant an
> extra branch.
> 
> > Also, did you look at adding this optimization to wake_up()? No-op
> > wakeups are very common, I think this has wider applicability.
> 
> I was going to suggest it myself, but then:
> 
> static inline bool wq_has_sleeper(struct wait_queue_head *wq_head)
> {
>         /*
>          * We need to be sure we are in sync with the
>          * add_wait_queue modifications to the wait queue.
>          *
>          * This memory barrier should be paired with one on the
>          * waiting side.
>          */
>         smp_mb();
>         return waitqueue_active(wq_head);
> }
> 
> Which means this is in fact quite expensive.
>
> Since wakeup is a lock + an interrupt trip, it would still be
> cheaper single-threaded to "merely" suffer a full fence and for cases
> where the queue is empty often enough this is definitely the right thing
> to do.

We're comparing against no-op wakeup. A real wakeup does an IPI, which
completely dwarfs the cost of a barrier.

And note that wake_up() is spin_lock_irqsave(), not spin_lock(). I
assume it's gotten better, but back when I was looking at waitqueues
nested pushf/popf was horrifically expensive.

But perhaps can we do this with just a release barrier? Similar to how
list_empty_careful() works.

> On the other hand this executing when the queue is mostly *not* empty
> would combat the point.
> 
> So unfortunately embedding this in wake_up is a no-go.

You definitely can't say that without knowing how often no-op
wake_up()s occur. It wouldn't be hard to gather that (write a patch to
add a pair of percpu counters, throw it on a few machines running random
workloads) and I think the results might surprise you.