Re: [RFC PATCH] locking/percpu-rwsem: use this_cpu_{inc|dec}() for read_count

Jan Kara <jack@xxxxxxx> · Fri, 18 Sep 2020 11:07:02 +0200

On Thu 17-09-20 14:01:33, Oleg Nesterov wrote:
> On 09/17, Boaz Harrosh wrote:
> >
> > On 16/09/2020 15:32, Hou Tao wrote:
> > <>
> > >However the performance degradation is huge under aarch64 (4 sockets, 24 core per sockets): nearly 60% lost.
> > >
> > >v4.19.111
> > >no writer, reader cn                               | 24        | 48        | 72        | 96
> > >the rate of down_read/up_read per second           | 166129572 | 166064100 | 165963448 | 165203565
> > >the rate of down_read/up_read per second (patched) |  63863506 |  63842132 |  63757267 |  63514920
> > >
> >
> > I believe perhaps Peter Z's suggestion of an additional
> > percpu_down_read_irqsafe() API and let only those in IRQ users pay the
> > penalty.
> >
> > Peter Z wrote:
> > >My leading alternative was adding: percpu_down_read_irqsafe() /
> > >percpu_up_read_irqsafe(), which use local_irq_save() instead of
> > >preempt_disable().
> 
> This means that __sb_start/end_write() and probably more users in fs/super.c
> will have to use this API, not good.
> 
> IIUC, file_end_write() was never IRQ safe (at least if !CONFIG_SMP), even
> before 8129ed2964 ("change sb_writers to use percpu_rw_semaphore"), but this
> doesn't matter...
> 
> Perhaps we can change aio.c, io_uring.c and fs/overlayfs/file.c to avoid
> file_end_write() in IRQ context, but I am not sure it's worth the trouble.

Well, that would be IMO rather difficult. We need to do file_end_write()
after the IO has completed so if we don't want to do it in IRQ context,
we'd have to queue a work to a workqueue or something like that. And that's
going to be expensive compared to pure per-cpu inc/dec...

If people really wanted to avoid irq-safe inc/dec for archs where it is
more expensive, one idea I had was that we could add 'read_count_in_irq' to
percpu_rw_semaphore. So callers in normal context would use read_count and
callers in irq context would use read_count_in_irq. And the writer side
would sum over both but we don't care about performance of that one much.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR