On Tue, 2018-09-11 at 13:40 +0200, Peter Zijlstra wrote: > On Tue, Sep 11, 2018 at 02:48:17AM +0100, Dmitry Safonov wrote: > > There is a couple of reports about lockup in ldsem_down_read() > > without > > anyone holding write end of ldisc semaphore: > > lkml.kernel.org/r/<20171121132855.ajdv4k6swzhvktl6@xxxxxxxxxxxxxxxx > > el.com> > > lkml.kernel.org/r/<20180907045041.GF1110@shao2-debian> > > > > They all looked like a missed wake up. > > I wasn't lucky enough to reproduce it, but it seems like reader on > > another CPU can miss waiter->task update and schedule again, > > resulting > > in indefinite (MAX_SCHEDULE_TIMEOUT) sleep. > > > > Make sure waked up reader will see waiter->task == NULL. > > --- a/drivers/tty/tty_ldsem.c > > +++ b/drivers/tty/tty_ldsem.c > > @@ -118,6 +118,8 @@ static void __ldsem_wake_readers(struct > > ld_semaphore *sem) > > tsk = waiter->task; > > smp_mb(); > > waiter->task = NULL; > > + /* Make sure down_read_failed() will see !waiter- > > >task update */ > > + smp_wmb(); > > wake_up_process(tsk); > > This is 'wrong', wake_up_process() should imply sufficient for this > to > already be true. Yeah, thanks. It was stupid of me not to check that.. Saw the smoke that would describe the reports and made too long-going conjectures. Need more covfefe and staring into that code. > > > put_task_struct(tsk); > > } -- Thanks, Dmitry