On 8/10/20 4:41 PM, Jann Horn wrote: > On Tue, Aug 11, 2020 at 12:01 AM Jens Axboe <axboe@xxxxxxxxx> wrote: >> On 8/10/20 3:28 PM, Jens Axboe wrote: >>> On 8/10/20 3:26 PM, Jann Horn wrote: >>>> On Mon, Aug 10, 2020 at 11:12 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>> On 8/10/20 3:10 PM, Peter Zijlstra wrote: >>>>>> On Mon, Aug 10, 2020 at 03:06:49PM -0600, Jens Axboe wrote: >>>>>> >>>>>>> should work as far as I can tell, but I don't even know if there's a >>>>>>> reliable way to do task_in_kernel(). >>>>>> >>>>>> Only on NOHZ_FULL, and tracking that is one of the things that makes it >>>>>> so horribly expensive. >>>>> >>>>> Probably no other way than to bite the bullet and just use TWA_SIGNAL >>>>> unconditionally... >>>> >>>> Why are you trying to avoid using TWA_SIGNAL? Is there a specific part >>>> of handling it that's particularly slow? >>> >>> Not particularly slow, but it's definitely heavier than TWA_RESUME. And >>> as we're driving any pollable async IO through this, just trying to >>> ensure it's as light as possible. >>> >>> It's not a functional thing, just efficiency. >> >> Ran some quick testing in a vm, which is worst case for this kind of >> thing as any kind of mucking with interrupts is really slow. And the hit >> is substantial. Though with the below, we're basically at parity again. >> Just for discussion... >> >> >> diff --git a/kernel/task_work.c b/kernel/task_work.c >> index 5c0848ca1287..ea2c683c8563 100644 >> --- a/kernel/task_work.c >> +++ b/kernel/task_work.c >> @@ -42,7 +42,8 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify) >> set_notify_resume(task); >> break; >> case TWA_SIGNAL: >> - if (lock_task_sighand(task, &flags)) { >> + if (!(task->jobctl & JOBCTL_TASK_WORK) && >> + lock_task_sighand(task, &flags)) { >> task->jobctl |= JOBCTL_TASK_WORK; >> signal_wake_up(task, 0); >> unlock_task_sighand(task, &flags); > > I think that should work in theory, but if you want to be able to do a > proper unlocked read of task->jobctl here, then I think you'd have to > use READ_ONCE() here and make all existing writes to ->jobctl use > WRITE_ONCE(). > > Also, I think that to make this work, stuff like get_signal() will > need to use memory barriers to ensure that reads from ->task_works are > ordered after ->jobctl has been cleared - ideally written such that on > the fastpath, the memory barrier doesn't execute. I wonder if it's possible to just make it safe for the io_uring case, since a bigger change would make this performance regression persistent in this release... Would still require the split add/notification patch, but that one is trivial. -- Jens Axboe