On Fri, Mar 29, 2019 at 11:25 PM Jonathan Kowalski <bl0pbl33p@xxxxxxxxx> wrote: > > On Sat, Mar 30, 2019 at 5:35 AM Daniel Colascione <dancol@xxxxxxxxxx> wrote: > > > > On Thu, Mar 28, 2019 at 3:38 AM Christian Brauner <christian@xxxxxxxxxx> wrote: > > > > > > > All that said, thanks for the work on this once again. My intention is > > > > just that we don't end up with an API that could have been done better > > > > and be cleaner to use for potential users in the coming years. > > > > > > Thanks for your input on all of this. I still don't find multiplexers in > > > the style of seccomp()/fsconfig()/keyctl() to be a problem since they > > > deal with a specific task. They are very much different from ioctl()s in > > > that regard. But since Joel, you, and Daniel found the pidctl() approach > > > not very nice I dropped it. The interface needs to be satisfactory for > > > all of us especially since Android and other system managers will be the > > > main consumers. > > > > Thanks. > > > > > So let's split this into pidfd_open(pid_t pid, unsigned int flags) which > > > allows to cleanly get pidfds independent procfs and do the translation > > > to procpidfds in an ioctl() as we've discussed in prior threads. This > > > > I sustain my objection to adding an ioctl. Compared to a system call, > > an ioctl has a more rigid interface, greater susceptibility to > > programmer error (due to the same ioctl control code potentially doing > > different things for different file types), longer path length, and > > more awkward filtering/monitoring/auditing/tracing. We've discussed > > this issue at length before, and I thought we all agreed to use system > > calls, not ioctl, for core kernel functionality. So why is an ioctl > > suddenly back on the table? The way I see it, an ioctl has no > > advantages except for 1) conserving system call numbers, which are not > > scarce, and 2) avoiding the system call number coordination problem > > (and the coordination problem isn't a factor for core kernel code). I > > don't understand everyone's reluctance to add new system calls. What > > am I missing? Why would we give up all the advantages that a system > > call gives us? > > > > I agree in general, but in this particular case a system call or an > ioctl doesn't matter much, all it does is take the pidfd, the command, > and /proc's dir fd. Thanks again. I agree that the operation we're discussing has a simple signature, but signature flexibility isn't the only reason to prefer a system call over an ioctl. There are other reasons for preferring system calls to ioctls (safety, tracing, etc.) that apply even if the operation we're discussing has a relatively simple signature: for example, every system call has a distinct and convenient ftrace event, but ioctls don't; strace filtering Just Works on a system-call-by-system-call basis, but it doesn't for ioctls; and documentation for system calls is much more discoverable (e.g., man -k) than documentation for ioctls. Even if the distinction doesn't matter much, IMHO, it still matters a little, enough to favor a system call without an offsetting advantage for the ioctl option. > If you start adding a system call for every specific operation on file > descriptors, it *will* become a problem. I'm not sure what you mean. Do you mean that adding a top-level system call for every operation that might apply to one specific kind of file descriptor would lead, as the overall result, to the kernel having enough system calls to cause negative consequences? I'm not sure I agree, but accepting this idea for the sake of discussion: shouldn't we be more okay with system calls for features present on almost all systems --- like procfs --- even if we punt to ioctls very rarely-used functionality, e.g., some hypothetical special squeak noise that you could get some specific 1995 adlib clone to make? > Besides, the translation is > just there because it is racy to do in userspace, it is not some well > defined core kernel functionality. > Therefore, it is just a way to > enter the kernel to do the openat in a race free and safe manner. I agree that the translation has to be done in the kernel, not userspace, and that the kernel must provide to userspace some interface for requesting that the translation happen: we're just discussing the shape of this interface. Shouldn't all interfaces provided by the kernel to userspace be equally well defined? I'm not sure that the internal simplicity of the operation matters much either. There are already explicit system calls for some simple-to-implement things, e.g., timerfd_gettime. It's worth noting that timerfd is (IIRC), like procfs, a feature that's both ubiquitous and optional. > As is, the facility being provided through an ioctl on the pidfd is > not something I'd consider a problem. You're right that from a signature perspective, using an ioctl isn't a problem. I just want to make sure we take into account the other, non-signature advantages that system calls have over ioctls. > I think the translation stuff > should also probably be an extension of ioctl_ns(2) (but I wouldn't be > opposed if translate_pid is resurrected as is). > For anything more involved than ioctl(pidfd, PIDFD_TO_PROCFD, > procrootfd), I'd agree that a system call would be a cleaner > interface, otherwise, if you cannot generalise it, using ioctls as a > command interface is probably the better tradeoff here. Sure: I just want to better understand everyone else's thought process here, having been frustrated with things like the termios ioctls being ioctls.