On Mon, Dec 09, 2024 at 06:43:11PM -0500, Gabriel Krisman Bertazi wrote: > From: Josh Triplett <josh@xxxxxxxxxxxxxxxx> > > This command executes the equivalent of an execveat(2) in a previously > spawned io_uring context, causing the execution to return to a new > program indicated by the SQE. > > As an io_uring command, it is special in a few ways, requiring some > quirks. First, it can only be executed from the spawned context linked > after the IORING_OP_CLONE command; In addition, the first successful > IORING_OP_EXEC command will terminate the link chain, causing > further operations to fail with -ECANCELED. > > There are a few reason for the first limitation: First, it wouldn't make > much sense to execute IORING_OP_EXEC in an io-wq, as it would simply > mean "stealing" the worker thread from io_uring; It would also be > questionable to execute inline or in a task work, as it would terminate > the execution of the ring. Another technical reason is that we'd > immediately deadlock (fixable), because we'd need to complete the > command and release the reference after returning from the execve, but > the context has already been invalidated by terminating the process. > All in all, considering io_uring's purpose to provide an asynchronous > interface, I'd (Gabriel) like to focus on the simple use-case first, > limiting it to the cloned context for now. This seems like a reasonable limitation for now. I'd eventually like to handle things like "install these fds, do some other setup calls, then execveat" as a ring submission (perhaps as a synchronous one), but leaving that out for now seems reasonable. The combination of clone and exec should probably get advertised as a new capability. If we add exec-without-clone in the future, that can be a second new capability. The commit message should probably also document the rationale for dfd not accepting a ring index (for now) rather than an installed fd. That *also* seems like a perfectly reasonable limitation for now, just one that needs documenting. Otherwise, LGTM, and thank you again for updating this!