On 05/03, Peter Zijlstra wrote: > > On Thu, May 02, 2019 at 12:09:32PM +0200, Oleg Nesterov wrote: > > > > +static void readers_block(struct percpu_rw_semaphore *sem) > > > +{ > > > + wait_event_cmd(sem->writer, !sem->readers_block, > > > + __up_read(&sem->rw_sem), __down_read(&sem->rw_sem)); > > > +} > > > + > > > +static void block_readers(struct percpu_rw_semaphore *sem) > > > +{ > > > + wait_event_exclusive_cmd(sem->writer, !sem->readers_block, > > > + __up_write(&sem->rw_sem), > > > + __down_write(&sem->rw_sem)); > > > + /* > > > + * Notify new readers to block; up until now, and thus throughout the > > > + * longish rcu_sync_enter() above, new readers could still come in. > > > + */ > > > + WRITE_ONCE(sem->readers_block, 1); > > > +} > > > > So iiuc, despite it name block_readers() also serializes the writers, ->rw_sem > > can be dropped by down_write_non_owner() so the new writer can take this lock. > > I don't think block_readers() is sufficient to serialize writers; > suppose two concurrent callers when !->readers_block. Without ->rwsem > that case would not serialize. Of course. I meant that the next writer can enter block_readers() if up_non_owner() drops ->rw_sem, but it will block in wait_event(!readers_block). (And if we change this code to use wait_event(xchg(readers_block) == 0) we can remove rw_sem altogether). The main problem is that this is sub-optimal. We can have a lot of readers sleeping in __down_read() when percpu_down_write() succeeds, then after percpu_down_write_non_owner() does up_write() they all will be woken just to hang in readers_block(). Plus the new readers will need to pass the lock-check-unlock-schedule path. Peter, just in case... I see another patch from you but I need to run away till Monday. Oleg.