On Thu, Apr 23, 2020 at 08:04:25AM +0200, Miklos Szeredi wrote: > On Thu, Apr 23, 2020 at 6:42 AM Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote: > > > > On Thu, Apr 23, 2020 at 06:24:14AM +0200, Miklos Szeredi wrote: > > > On Thu, Apr 23, 2020 at 2:48 AM Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote: > > > > On Wed, Apr 22, 2020 at 09:55:56AM +0200, Miklos Szeredi wrote: > > > > > On Wed, Apr 22, 2020 at 8:06 AM Michael Kerrisk (man-pages) > > > > > <mtk.manpages@xxxxxxxxx> wrote: > > > > > > > > > > > > [CC += linux-api] > > > > > > > > > > > > On Wed, 22 Apr 2020 at 07:20, Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > Inspired by the X protocol's handling of XIDs, allow userspace to select > > > > > > > the file descriptor opened by openat2, so that it can use the resulting > > > > > > > file descriptor in subsequent system calls without waiting for the > > > > > > > response to openat2. > > > > > > > > > > > > > > In io_uring, this allows sequences like openat2/read/close without > > > > > > > waiting for the openat2 to complete. Multiple such sequences can > > > > > > > overlap, as long as each uses a distinct file descriptor. > > > > > > > > > > If this is primarily an io_uring feature, then why burden the normal > > > > > openat2 API with this? > > > > > > > > This feature was inspired by io_uring; it isn't exclusively of value > > > > with io_uring. (And io_uring doesn't normally change the semantics of > > > > syscalls.) > > > > > > What's the use case of O_SPECIFIC_FD beyond io_uring? > > > > Avoiding a call to dup2 and close, if you need something as a specific > > file descriptor, such as when setting up to exec something, or when > > debugging a program. > > > > I don't expect it to be as widely used as with io_uring, but I also > > don't want io_uring versions of syscalls to diverge from the underlying > > syscalls, and this would be a heavy divergence. > > What are the plans for those syscalls that don't easily lend > themselves to this modification (such as accept(2))? accept4 has a flags argument with more flags available, so it'd be entirely possible to cleanly extend it further without introducing a new version. The same goes for other fd-producing syscalls that still have flag space available. This may or may not provide sufficient motivation on its own to introduce a new syscall variant of a syscall that isn't currently extensible. > Compared to that, having a common flag for file ops to enable the use > of fixed and private file descriptors is a clean and well contained > interface. "private" is not a desirable property here. See below. Even if the userspace-specified fd mechanism were to become something only accessible via io_uring (which I'd prefer to avoid), that's not a reason to avoid generating real file descriptors that work anywhere a file descriptor works. > > > > > This would also allow Implementing a private fd table for io_uring. > > > > > I.e. add a flag interpreted by file ops (IORING_PRIVATE_FD), including > > > > > openat2 and freely use the private fd space without having to worry > > > > > about interactions with other parts of the system. > > > > > > > > I definitely don't want to add a special kind of file descriptor that > > > > doesn't work in normal syscalls taking file descriptors. A file > > > > descriptor allocated via O_SPECIFIC_FD is an entirely normal file > > > > descriptor, and works anywhere a file descriptor normally works. > > > > > > What's the use case of allocating a file descriptor within io_uring > > > and using it outside of io_uring? > > > > Calling a syscall not provided via io_uring. Calling a library that > > doesn't use io_uring. Passing the file descriptor via UNIX socket to > > another program. Passing the file descriptor via exec to another > > program. Userspace is modular, and file descriptors are widely used. > > I mean, you could open the file descriptor outside of io_uring in such > cases, no? I would prefer to not introduce that limitation in the first place, and instead open normal file descriptors. > The point of O_SPECIFIC_FD is to be able to perform short > sequences of open/dosomething/close without having to block and having > to issue separate syscalls. "close" is not a required component. It's entirely possible to use io_uring to open a file descriptor, do various things with it, and then leave it open for subsequent usage via either other io_uring chains or standalone syscalls. > If you're going to issue separate > syscalls anyway, then I see no point in doing the open within > io_uring. Or? io_uring is not an all-or-nothing proposition. There's value in using io_uring for some operations without converting an entire program (and every library it might potentially use on a file descriptor) entirely to io_uring. Userspace is modular, and file descriptors are a common element used by many different bits of userspace.