Re: [PATCH 1/2] percpu-rwsem: use synchronize_sched_expedited

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Fri, 30 Nov 2012 05:42:13 -0800

On Thu, Nov 29, 2012 at 10:00:53PM -0500, Mikulas Patocka wrote:
> On Thu, 29 Nov 2012, Andrew Morton wrote:
> > On Tue, 27 Nov 2012 22:59:52 -0500 (EST)
> > Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> > 
> > > percpu-rwsem: use synchronize_sched_expedited
> > > 
> > > Use synchronize_sched_expedited() instead of synchronize_sched()
> > > to improve mount speed.
> > > 
> > > This patch improves mount time from 0.500s to 0.013s.
> > > 
> > > Note: if realtime people complain about the use
> > > synchronize_sched_expedited() and synchronize_rcu_expedited(), I suggest
> > > that they introduce an option CONFIG_REALTIME or
> > > /proc/sys/kernel/realtime and turn off these *_expedited functions if
> > > the option is enabled (i.e. turn synchronize_sched_expedited into
> > > synchronize_sched and synchronize_rcu_expedited into synchronize_rcu).
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > 
> > So I read through this thread but I really didn't see a clear
> > description of why mount() got slower.  The changelog for 4b05a1c74d1
> > is spectacularly awful :(
> > 
> > 
> > Apparently the slowdown occurred because a blockdev mount patch
> > 62ac665ff9fc07497ca524 ("blockdev: turn a rw semaphore into a percpu rw
> > semaphore") newly uses percpu rwsems, and percpu rwsems are slow on the
> > down_write() path.
> > 
> > And using synchronize_sched_expedited() rather than synchronize_sched()
> > makes percpu_down_write() somewhat less slow.  Correct?
> 
> Yes.
> 
> > Why is it OK to use synchronize_sched_expedited() here?  If it's
> > faster, why can't we use synchronize_sched_expedited() everywhere and
> > zap synchronize_sched()?
> 
> Because synchronize_sched_expedited sends interrupts to all processors and 
> it is bad for realtime workloads.
> 
> Peter Zijlstra once complained when I used synchronize_rcu_expedited in 
> bdi_remove_from_list (but he left it there).
> 
> I suggest that if it really hurts real time response for someone, let they 
> introduce a switch to turn it into non-expedited call.

Once Frederic's adaptive-ticks work reaches mainline, it will be possible
to avoid the IPIs to CPUs that are executing in user mode, in addition to
the current code's avoiding sending IPIs to CPUs that are idle.  That said,
it will still be necessary to send IPIs to CPUs that are executing in
the kernel.

So things will get better, but won't be perfect.  Sort of like this was
real life or something.  ;-)

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html