Re: [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't running

Jens Axboe <axboe@xxxxxxxxx> · Mon, 10 Aug 2020 19:25:11 -0600

On 8/10/20 4:41 PM, Jann Horn wrote:
> On Tue, Aug 11, 2020 at 12:01 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 8/10/20 3:28 PM, Jens Axboe wrote:
>>> On 8/10/20 3:26 PM, Jann Horn wrote:
>>>> On Mon, Aug 10, 2020 at 11:12 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>> On 8/10/20 3:10 PM, Peter Zijlstra wrote:
>>>>>> On Mon, Aug 10, 2020 at 03:06:49PM -0600, Jens Axboe wrote:
>>>>>>
>>>>>>> should work as far as I can tell, but I don't even know if there's a
>>>>>>> reliable way to do task_in_kernel().
>>>>>>
>>>>>> Only on NOHZ_FULL, and tracking that is one of the things that makes it
>>>>>> so horribly expensive.
>>>>>
>>>>> Probably no other way than to bite the bullet and just use TWA_SIGNAL
>>>>> unconditionally...
>>>>
>>>> Why are you trying to avoid using TWA_SIGNAL? Is there a specific part
>>>> of handling it that's particularly slow?
>>>
>>> Not particularly slow, but it's definitely heavier than TWA_RESUME. And
>>> as we're driving any pollable async IO through this, just trying to
>>> ensure it's as light as possible.
>>>
>>> It's not a functional thing, just efficiency.
>>
>> Ran some quick testing in a vm, which is worst case for this kind of
>> thing as any kind of mucking with interrupts is really slow. And the hit
>> is substantial. Though with the below, we're basically at parity again.
>> Just for discussion...
>>
>>
>> diff --git a/kernel/task_work.c b/kernel/task_work.c
>> index 5c0848ca1287..ea2c683c8563 100644
>> --- a/kernel/task_work.c
>> +++ b/kernel/task_work.c
>> @@ -42,7 +42,8 @@ task_work_add(struct task_struct *task, struct callback_head *work, int notify)
>>                 set_notify_resume(task);
>>                 break;
>>         case TWA_SIGNAL:
>> -               if (lock_task_sighand(task, &flags)) {
>> +               if (!(task->jobctl & JOBCTL_TASK_WORK) &&
>> +                   lock_task_sighand(task, &flags)) {
>>                         task->jobctl |= JOBCTL_TASK_WORK;
>>                         signal_wake_up(task, 0);
>>                         unlock_task_sighand(task, &flags);
> 
> I think that should work in theory, but if you want to be able to do a
> proper unlocked read of task->jobctl here, then I think you'd have to
> use READ_ONCE() here and make all existing writes to ->jobctl use
> WRITE_ONCE().
> 
> Also, I think that to make this work, stuff like get_signal() will
> need to use memory barriers to ensure that reads from ->task_works are
> ordered after ->jobctl has been cleared - ideally written such that on
> the fastpath, the memory barrier doesn't execute.

I wonder if it's possible to just make it safe for the io_uring case,
since a bigger change would make this performance regression persistent
in this release... Would still require the split add/notification patch,
but that one is trivial.

-- 
Jens Axboe