On 02/18, Peter Zijlstra wrote: > > But this has me wondering about task_work_run(), as it is it will > unconditionally take pi_lock, because spin_unlock_wait() was removed ;) task_work_run() doesn't really need to take pi_lock at all. > would not something like this make sense? I think yes, but see below. > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -93,16 +93,20 @@ void task_work_run(void) > struct callback_head *work, *head, *next; > > for (;;) { > + work = READ_ONCE(task->task_work); > + if (!work) > + break This is wrong if PF_EXITING is set, in this case we must set task->task_works = work_exited. > + > /* > * work->func() can do task_work_add(), do not set > * work_exited unless the list is empty. > */ > raw_spin_lock_irq(&task->pi_lock); > do { > - work = READ_ONCE(task->task_works); > - head = !work && (task->flags & PF_EXITING) ? > - &work_exited : NULL; > - } while (cmpxchg(&task->task_works, work, head) != work); > + head = NULL; > + if (unlikely(!work && (task->flags & PF_EXITING))) > + head = &work_exited; > + } while (!try_cmpxchg(&task->task_works, &work, head)); > raw_spin_unlock_irq(&task->pi_lock); > > if (!work) otherwise I think this is correct, but how about the patch below? Then this code can be changed to use try_cmpxchg(). Oleg. --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -97,17 +97,24 @@ void task_work_run(void) * work->func() can do task_work_add(), do not set * work_exited unless the list is empty. */ - raw_spin_lock_irq(&task->pi_lock); do { + head = NULL; work = READ_ONCE(task->task_works); - head = !work && (task->flags & PF_EXITING) ? - &work_exited : NULL; + if (!work) { + if (task->flags & PF_EXITING) + head = &work_exited; + else + break; + } } while (cmpxchg(&task->task_works, work, head) != work); - raw_spin_unlock_irq(&task->pi_lock); if (!work) break; + // Synchronize with task_work_cancel() + raw_spin_lock_irq(&task->pi_lock); + raw_spin_unlock_irq(&task->pi_lock); + do { next = work->next; work->func(work);