Re: [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't running

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/10/20 2:35 PM, Jann Horn wrote:
> On Mon, Aug 10, 2020 at 10:25 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 8/10/20 2:13 PM, Jens Axboe wrote:
>>>> Would it be clearer to write it like so perhaps?
>>>>
>>>>      /*
>>>>       * Optimization; when the task is RUNNING we can do with a
>>>>       * cheaper TWA_RESUME notification because,... <reason goes
>>>>       * here>. Otherwise do the more expensive, but always correct
>>>>       * TWA_SIGNAL.
>>>>       */
>>>>      if (READ_ONCE(tsk->state) == TASK_RUNNING) {
>>>>              __task_work_notify(tsk, TWA_RESUME);
>>>>              if (READ_ONCE(tsk->state) == TASK_RUNNING)
>>>>                      return;
>>>>      }
>>>>      __task_work_notify(tsk, TWA_SIGNAL);
>>>>      wake_up_process(tsk);
>>>
>>> Yeah that is easier to read, wasn't a huge fan of the loop since it's
>>> only a single retry kind of condition. I'll adopt this suggestion,
>>> thanks!
>>
>> Re-write it a bit on top of that, just turning it into two separate
>> READ_ONCE, and added appropriate comments. For the SQPOLL case, the
>> wake_up_process() is enough, so we can clean up that if/else.
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=49bc5c16483945982cf81b0109d7da7cd9ee55ed
> 
> I think I'm starting to understand the overall picture here, and I
> think if my understanding is correct, your solution isn't going to
> work properly.
> 
> My understanding of the scenario you're trying to address is:
> 
>  - task A starts up io_uring
>  - task A tells io_uring to bump the counter of an eventfd E when work
> has been completed
>  - task A submits some work ("read a byte from file descriptor X", or
> something like that)
>  - io_uring internally starts an asynchronous I/O operation, with a callback C
>  - task A calls read(E, &counter, sizeof(counter)) to wait for events
> to be processed
>  - the async I/O operation finishes, C is invoked, and C schedules
> task_work for task A
> 
> And here you run into a deadlock, because the task_work will only run
> when task A returns from the syscall, but the syscall will only return
> once the task_work is executing and has finished the I/O operation.
> 
> 
> If that is the scenario you're trying to solve here (where you're
> trying to force a task that's in the middle of some syscall that's
> completely unrelated to io_uring to return back to syscall context), I
> don't think this will work: It might well be that the task has e.g.
> just started entering the read() syscall, and is *about to* block, but
> is currently still running.

Your understanding of the scenario appears to be correct, and as far as
I can tell, also your analysis of why the existing approach doesn't
fully close it. You're right in that the task could currently be on its
way to blocking, but still running. And for that case, TWA_RESUME isn't
going to cut it.

Ugh. This basically means I have to use TWA_SIGNAL regardless of state,
unless SQPOLL is used. That's not optimal.

Alternatively:

if (tsk->state != TASK_RUNNING || task_in_kernel(tsk))
	notify = TWA_SIGNAL;
else
	notify = TWA_RESUME;

should work as far as I can tell, but I don't even know if there's a
reliable way to do task_in_kernel(). But I suspect this kind of check
would still save the day, as we're not really expecting the common case
to be that the task is in the kernel on the way to blocking. And it'd be
kind of annoying to have to cater to that scenario by slowing down the
fast path.

Suggestions?

-- 
Jens Axboe




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux