On Sat, Mar 14, 2015 at 7:29 PM, Josh Triplett <josh@xxxxxxxxxxxxxxxx> wrote: > On Sat, Mar 14, 2015 at 12:03:12PM -0700, Thiago Macieira wrote: >> On Friday 13 March 2015 18:11:32 Thiago Macieira wrote: >> > On Friday 13 March 2015 14:51:47 Andy Lutomirski wrote: >> > > In any event, we should find out what FreeBSD does in response to >> > > read(2) on the fd. >> > >> > I've just successfully installed FreeBSD and compiled qtbase (main package >> > of Qt 5) on it. >> > >> > I'll test pdfork during the weekend and report its behaviour. >> >> Here are my findings about pdfork. >> >> Source: http://fxr.watson.org/fxr/source/kern/sys_procdesc.c?v=FREEBSD10 >> Qt adaptations: https://codereview.qt-project.org/108561 >> >> Processes created with pdfork() are normal processes that still send SIGCHLD >> to their parents. The only difference is that you get the extra file descriptor >> that can be passed to the pdgetpid() system call and works on select()/poll(). >> Trying to read from that file descriptor will result in EOPNOTSUPP. > > OK, since read() doesn't work on a pdfork() file descriptor, we don't > have to worry about compatibility with pdfork()'s read result. > > However, if the expectation is that pdfork()ed child processes still > send SIGCHLD, then I don't see how we can be compatible there, nor do I > think we want to; as you mention below, that breaks the ability to > encapsulate management of the created process entirely within a library. I didn't think that was the case -- my understanding was that pdfork()ed children would not generate SIGCHLD (and that does seem to be the case with a quick test program). As an aside, I do think there are some aspects of FreeBSD's process descriptors that aren't quite right yet, particularly their interaction with waitpid(-1, ...) -- IIRC pdfork()ed children are visible to it, but I'd expect them not to be (to allow libraries to use sub-processes invisibly to the programs using them). There's a thread at: https://lists.cam.ac.uk/pipermail/cl-capsicum-discuss/2014-March/thread.html but I'm not sure that anything came of that discussion. As it happens, I'm meeting Robert Watson (one of the progenitors of Capsicum/process descriptors) tomorrow, so I'll chase further. >> Since they've never implemented pdwait4() (it's not even declared in the >> headers), the only way to reap a child if you only have the file descriptor is >> to first pdgetpid() and then call wait4() or wait6(). > > Which suggests that we shouldn't try to implement pdwait4() in glibc > until FreeBSD implements it in their kernel, since we won't know the > exact semantics they expect. By the way, I should point out one part of the FreeBSD design which might help explain some of the semantics. Process descriptors are particularly designed to be used with Capsicum, which is a security framework where file descriptors get extra rights associated with them, and the kernel polices the use of those rights (e.g. you need CAP_READ for read(2) operations; normal file descriptors implicitly have all of the rights for back-compatibility). https://www.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4 Capsicum also includes 'capability mode', where system calls that access global namespaces are disabled -- including the pid namespace. So process descriptors are the only way to manipulate child processes when a program is in capability mode -- and this means that pdkill() is then genuinely needed over and above kill(pdgetpid(),...). >> If you don't pass PD_DAEMON, the child process gets killed with SIGKILL when >> the file closes. > > OK, that makes sense. We could certainly implement a > CLONE_FD_KILL_ON_CLOSE flag with those semantics, if we want one in the > future. > >> Conclusion: >> Pros: this is the bare minimum that we'd need to disentangle the SIGCHLD mess. >> As long as all child process activations use this feature, the problem is >> solved. >> >> Cons: it requires cooperation from all child starters. If some other library >> or the application installs a global SIGCHLD handler that waits on all child >> processes, like libvlc used to do and Glib and Ecore still do, you won't be >> able to get the child exit status. >> >> I have not tested what happens if you try to pass the file descriptor to other >> processes (can you even do that on FreeBSD?). But even if you could and got >> notifications, you couldn't wait on the child to get its exit status -- unless >> they implement pdwait4. > > Even if they do implement pdwait4, they might not bypass the "must be > the parent process" restriction. Let's wait to see what semantics they > go with. Hmm, interesting point. FreeBSD certainly allows FD passing, but I'm not sure what the interactions are when it's a process descriptor that's passed. Given the object-capability background to Capsicum, I'd assume that a holder of the process descriptor should be able to do whatever operations are allowed by the rights associated with the descriptor (CAP_PDGETPID, CAP_PDKILL and CAP_PDWAIT exist as specific rights allowing those operations, and a non-restricted descriptor will have all of them by default). But I'll add some test cases for this to the Capsicum test suite to check whether theory matches practice... https://github.com/google/capsicum-test/blob/dev/procdesc.cc >> - pdfork: can be emulated with clone4 + CLONE_FD (+ CLONEFD_KILL_ON_CLOSE) >> - pdwait4: can be emulated with read() >> - pdgetpid: needs an ioctl >> - pdkill: needs an ioctl [or just write()] > > I think that should be a dedicated syscall, not an ioctl. > > It's unfortunate that rt_sigqueueinfo doesn't take a flags argument. > However, I just realized that it takes a 32-bit "int" for the signal > number, yet signal numbers fit in 8 bits. So we could just add flags in > the high 24 bits of that argument, and in particular add a flag > indicating that the first argument is a file descriptor rather than a > PID. > > - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html