Pavel, Jens, On Tue, Jun 01 2021 at 17:29, Pavel Begunkov wrote: > On 6/1/21 5:01 PM, Jens Axboe wrote: >>> Yes, that would be preferable, but looks futexes don't use >>> waitqueues but some manual enqueuing into a plist_node, see >>> futex_wait_queue_me() or mark_wake_futex(). >>> Did I miss it somewhere? >> >> Yes, we'd need to augment that with a callback. I do think that's going > > Yeah, that was the first idea, but it's also more intrusive for the > futex codebase. Can be piled on top for next revision of patches. > > A question to futex maintainers, how much resistance to merging > something like that I may expect? Adding a waitqueue like callback or the proposed thing? TBH. Neither one has a charm. 1) The proposed solution: I can't figure out from the changelogs or the cover letter what kind of problems it solves and what the exact semantics are. If you ever consider to submit futex patches, may I recommend to study Documentation/process and get some inspiration from git-log? What are the lifetime rules, what's the interaction with regular futexes, what's the interaction with robust list ....? Without interaction with regular futexes such a functionality does not make any sense at all. Also once we'd open that can of worms where is this going to end and where can we draw the line? This is going to be a bottomless pit because I don't believe for a split second that this simple interface is going to be sufficient. Aside of that we are definitely _not_ going to expose any of the internal functions simply because they evade any sanity check which happens at the syscall wrappers and I have zero interest to deal with the fallout of unfiltered input which comes via io-uring interfaces or try to keep those up to date when the core filtering changes. 2) Adding a waitqueue like callback is daft. a) Lifetime rules See mark_wake_futex(). b) Locking The wakeup mechanism is designed to avoid hb->lock contention as much as possible. The dequeue/mark for wakeup happens under hb->lock and the actual wakeup happens after dropping hb->lock. This is not going to change. It's not even debatable. Aside of that this is optimized for minimal hb->lock hold time in general. So the only way to do that would be to invoke the callback from mark_wake_futex() _before_ invalidating futex_q and the callback plus the argument(s) would have to be stored in futex_q. Where does this information come from? Which context would invoke the wait with callback and callback arguments? User space, io-uring state machine or what? Aside of the extra storage (on stack) and yet more undefined life time rules and no semantics for error cases etc., that also would enforce that the callback is invoked with hb->lock held. IOW, it's making the hb->lock held time larger, which is exactly what the existing code tries to avoid by all means. But what would that solve? I can't tell because the provided information is absolutely useless for anyone not familiar with your great idea: "Add futex wait requests, those always go through io-wq for simplicity." What am I supposed to read out of this? Doing it elsewhere would be more complex? Really useful information. And I can't tell either what Jens means here: "Not a huge fan of that, I think this should tap into the waitqueue instead and just rely on the wakeup callback to trigger the event. That would be a lot more efficient than punting to io-wq, both in terms of latency on trigger, but also for efficiency if the app is waiting on a lot of futexes." and here: "Yes, we'd need to augment that with a callback. I do think that's going to be necessary, I don't see the io-wq solution working well outside of the most basic of use cases. And even for that, it won't be particularly efficient for single waits." All of these quotes are useless word salad without context and worse without the minimal understanding how futexes work. So can you folks please sit down and write up a coherent description of: 1) The problem you are trying to solve 2) How this futex functionality should be integrated into io-uring including the contexts which invoke it. 3) Interaction with regular sys_futex() operations. 4) Lifetime and locking rules. Unless that materializes any futex related changes are not even going to be reviewed. I did not even try to review this stuff, I just tried to make sense out of it, but while skimming it, it was inevitable to spot this gem: +int futex_wake_op_single(u32 __user *uaddr, int nr_wake, unsigned int op, + bool shared, bool try); ... + ret = futex_wake_op_single(f->uaddr, f->nr_wake, f->wake_op_arg, + !(f->flags & IORING_FUTEX_SHARED), + nonblock); You surely made your point that this is well thought out. Thanks, tglx