On Tue, Aug 11, 2020 at 12:01 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > On 8/10/20 3:28 PM, Jens Axboe wrote: > > On 8/10/20 3:26 PM, Jann Horn wrote: > >> On Mon, Aug 10, 2020 at 11:12 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > >>> On 8/10/20 3:10 PM, Peter Zijlstra wrote: > >>>> On Mon, Aug 10, 2020 at 03:06:49PM -0600, Jens Axboe wrote: > >>>> > >>>>> should work as far as I can tell, but I don't even know if there's a > >>>>> reliable way to do task_in_kernel(). > >>>> > >>>> Only on NOHZ_FULL, and tracking that is one of the things that makes it > >>>> so horribly expensive. > >>> > >>> Probably no other way than to bite the bullet and just use TWA_SIGNAL > >>> unconditionally... > >> > >> Why are you trying to avoid using TWA_SIGNAL? Is there a specific part > >> of handling it that's particularly slow? > > > > Not particularly slow, but it's definitely heavier than TWA_RESUME. And > > as we're driving any pollable async IO through this, just trying to > > ensure it's as light as possible. > > > > It's not a functional thing, just efficiency. > > Ran some quick testing in a vm, which is worst case for this kind of > thing as any kind of mucking with interrupts is really slow. And the hit > is substantial. Though with the below, we're basically at parity again. > Just for discussion... > > > diff --git a/kernel/task_work.c b/kernel/task_work.c > index 5c0848ca1287..ea2c683c8563 100644 > --- a/kernel/task_work.c > +++ b/kernel/task_work.c > @@ -42,7 +42,8 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify) > set_notify_resume(task); > break; > case TWA_SIGNAL: > - if (lock_task_sighand(task, &flags)) { > + if (!(task->jobctl & JOBCTL_TASK_WORK) && > + lock_task_sighand(task, &flags)) { > task->jobctl |= JOBCTL_TASK_WORK; > signal_wake_up(task, 0); > unlock_task_sighand(task, &flags); I think that should work in theory, but if you want to be able to do a proper unlocked read of task->jobctl here, then I think you'd have to use READ_ONCE() here and make all existing writes to ->jobctl use WRITE_ONCE(). Also, I think that to make this work, stuff like get_signal() will need to use memory barriers to ensure that reads from ->task_works are ordered after ->jobctl has been cleared - ideally written such that on the fastpath, the memory barrier doesn't execute.