On 12/9/24 23:43, Gabriel Krisman Bertazi wrote:
During LPC 2022, Josh Triplett proposed io_uring_spawn as a mechanism to fork and exec new processes through io_uring [1]. The goal, according to him, was to have a very efficient mechanism to quickly execute tasks, eliminating the multiple roundtrips to userspace required to fork, perform multiple $PATH lookup and finally execve. In addition, he mentioned this would allow for a more simple implementation of preparatory tasks, such as file redirection configuration, and handling of stuff like posix_spawn_file_actions_t. This RFC revives his original patchset. I fixed all the pending issues I found with task submission, including the issue blocking the work at the time, a kernel corruption after a few spawns, converted the execve command into execveat* variant, cleaned up the code and surely introduced a few bugs of my own along the way. At this point, I made it an RFC because I have a few outstanding questions about the design, in particular whether the CLONE context would be better implemented as a special io-wq case to avoid the exposure of io_issue_sqe and duplication of the dispatching logic. I'm also providing the liburing support in a separate patchset, including a testcase that exemplifies the $PATH lookup mechanism proposed by Josh.
Sorry to say but the series is rather concerning. 1) It creates a special path that tries to mimick the core path, but not without a bunch of troubles and in quite a special way. 2) There would be a special set of ops that can only be run from that special path. 3) And I don't believe that path can ever be allowed to run anything but these ops from (2) and maybe a very limited subset of normal ops like nop requests but no read/write/send/etc. (?) 4) And it all requires links, which already a bad sign for a bunch of reasons. At this point it raises a question why it even needs io_uring infra? I don't think it's really helping you. E.g. why not do it as a list of operation in a custom format instead of links? That can be run by a single io_uring request or can even be a normal syscall. struct clone_op ops = { { CLONE }, { SET_CRED, cred_id }, ..., { EXEC, path }}; Makes me wonder about a different ways of handling. E.g. why should it be run in the created task context (apart from final exec)? Can requests be run as normal by the original task, each will take the half created and not yet launched task as a parameter (in some form), modify it, and the final exec would launch it? -- Pavel Begunkov