On Tue, Feb 18, 2020 at 03:56:46PM +0100, Oleg Nesterov wrote: > On 02/18, Peter Zijlstra wrote: > > > > But this has me wondering about task_work_run(), as it is it will > > unconditionally take pi_lock, > > because spin_unlock_wait() was removed ;) task_work_run() doesn't > really need to take pi_lock at all. Right. > > --- a/kernel/task_work.c > > +++ b/kernel/task_work.c > > @@ -93,16 +93,20 @@ void task_work_run(void) > > struct callback_head *work, *head, *next; > > > > for (;;) { > > + work = READ_ONCE(task->task_work); > > + if (!work) > > + break > > This is wrong if PF_EXITING is set, in this case we must set > task->task_works = work_exited. Indeed! > > + > > /* > > * work->func() can do task_work_add(), do not set > > * work_exited unless the list is empty. > > */ > > raw_spin_lock_irq(&task->pi_lock); > > do { > > - work = READ_ONCE(task->task_works); > > - head = !work && (task->flags & PF_EXITING) ? > > - &work_exited : NULL; > > - } while (cmpxchg(&task->task_works, work, head) != work); > > + head = NULL; > > + if (unlikely(!work && (task->flags & PF_EXITING))) > > + head = &work_exited; > > + } while (!try_cmpxchg(&task->task_works, &work, head)); > > raw_spin_unlock_irq(&task->pi_lock); > > > > if (!work) > > otherwise I think this is correct, but how about the patch below? > Then this code can be changed to use try_cmpxchg(). Works for me. Thanks!