Jens Axboe <axboe@xxxxxxxxx> writes: > On 8/23/22 12:22 PM, Eric W. Biederman wrote: >> Olivier Langlois <olivier@xxxxxxxxxxxxxx> writes: >> >>> On Mon, 2022-08-22 at 17:16 -0400, Olivier Langlois wrote: >>>> >>>> What is stopping the task calling do_coredump() to be interrupted and >>>> call task_work_add() from the interrupt context? >>>> >>>> This is precisely what I was experiencing last summer when I did work >>>> on this issue. >>>> >>>> My understanding of how async I/O works with io_uring is that the >>>> task >>>> is added to a wait queue without being put to sleep and when the >>>> io_uring callback is called from the interrupt context, >>>> task_work_add() >>>> is called so that the next time io_uring syscall is invoked, pending >>>> work is processed to complete the I/O. >>>> >>>> So if: >>>> >>>> 1. io_uring request is initiated AND the task is in a wait queue >>>> 2. do_coredump() is called before the I/O is completed >>>> >>>> IMHO, this is how you end up having task_work_add() called while the >>>> coredump is generated. >>>> >>> I forgot to add that I have experienced the issue with TCP/IP I/O. >>> >>> I suspect that with a TCP socket, the race condition window is much >>> larger than if it was disk I/O and this might make it easier to >>> reproduce the issue this way... >> >> I was under the apparently mistaken impression that the io_uring >> task_work_add only comes from the io_uring userspace helper threads. >> Those are definitely suppressed by my change. >> >> Do you have any idea in the code where io_uring code is being called in >> an interrupt context? I would really like to trace that code path so I >> have a better grasp on what is happening. >> >> If task_work_add is being called from interrupt context then something >> additional from what I have proposed certainly needs to be done. > > task_work may come from the helper threads, but generally it does not. > One example would be doing a read from a socket. There's no data there, > poll is armed to trigger a retry. When we get the poll notification that > there's now data to be read, then we kick that off with task_work. Since > it's from the poll handler, it can trigger from interrupt context. See > the path from io_uring/poll.c:io_poll_wake() -> __io_poll_execute() -> > io_req_task_work_add() -> task_work_add(). But that is a task_work to the helper thread correct? > It can also happen for regular IRQ based reads from regular files, where > the completion is actually done via task_work added from the potentially > IRQ based completion path. I can see that. Which leaves me with the question do these task_work's directly wake up the thread that submitted the I/O request? Or is there likely to be something for an I/O thread to do before an ordinary program thread is notified. I am asking because it is only the case of notifying ordinary program threads that is interesting in the case of a coredump. As I understand it a data to read notification would typically be handled by the I/O uring worker thread to trigger reading the data before letting userspace know everything it asked to be done is complete. Eric