On Sat, Jan 11, 2025 at 4:33 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > On 1/10/25 3:26 PM, Jann Horn wrote: > > Hi! > > > > I think there is some brittle interaction between futex and io_uring; > > but to be clear, I don't think that there is actually a bug here. > > > > In io_uring, when a IORING_OP_FUTEX_WAIT SQE is submitted with > > IOSQE_ASYNC, an io_uring worker thread can queue up futex waiters via > > the following path: > > > > ret_from_fork -> io_wq_worker -> io_worker_handle_work -> > > io_wq_submit_work[called as ->do_work] -> io_issue_sqe -> > > io_futex_wait[called as .issue] -> futex_queue -> __futex_queue > > > > futex_q instances normally describe synchronously waiting tasks, and > > __futex_queue() records the identity of the calling task (which is > > normally the waiter) in futex_q::task. But io_uring waits on futexes > > asynchronously instead; from io_uring's perspective, a pending futex > > wait is not tied to the task that called into futex_queue(), it is > > just tied to the userspace task on behalf of which the io_uring worker > > is acting (I think). So when a futex wait operation is started by an > > io_uring worker task, I think that worker task could go away while the > > futex_q is still queued up on the futex, and so I think we can end up > > with a futex_q whose "task" member points to a freed task_struct. > > > > The good part is that (from what I understand) that "task" member is > > only used for two purposes: > > > > 1. futexes that are either created through the normal futex syscalls > > use futex_wake_mark as their .wake callback, which needs the task > > pointer to know which task should be woken. > > 2. PI futexes use it for priority inheritance magic (and AFAICS there > > is no way for io_uring to interface with PI futexes) > > > > I'm not sure what is the best thing to do is here - maybe the current > > situation is fine, and I should just send a patch that adds a comment > > describing this to the definition of the "task" member? Or maybe it > > would be better for robustness to ensure that the "task" member is > > NULLed out in those cases, though that would probably make the > > generated machine code a little bit more ugly? (Or maybe I totally > > misunderstand what's going on and there isn't actually a dangling > > pointer...) > > Good find. And yes the io-wq task could go away, if there's more of > them. > > While it isn't an issue, dangling pointers make me nervous. We could do > something like the (totally untested) below patch, where io_uring just > uses __futex_queue() and make __futex_queue() take a task_struct as > well. Other callers pass in 'current'. > > It's quite possible that it'd be safe to just use __futex_queue() and > clear ->task after the queue, but if we pass in NULL from the get-go, > then there's definitely not a risk of anything being dangling. Yeah, this looks like a nice cleanup that makes this safer, thanks!