Hello Christian, On 9/16/19 9:40 AM, Christian Brauner wrote: > On Wed, Sep 11, 2019 at 10:58:57AM +0200, Michael Kerrisk (man-pages) wrote: >> Hello Christian, >> >> On 5/11/19 8:49 AM, Christian Brauner wrote: >>> From: Christian Brauner <christian@xxxxxxxxxx> >>> >>> Add an entry for CLONE_PIDFD. This flag is available starting with >>> kernel 5.2. If specified, a process file descriptor ("pidfd") referring >>> to the child process will be returned in the ptid argument. >> >> I've applied this patch in a local branch, and made some minor edits > > Thank you! :) > >> and added a piece. And I have some questions. See below. >> >>> Signed-off-by: Christian Brauner <christian@xxxxxxxxxx> >>> --- [...] >>> Note, that the kernel verifies that the value for >>> +.I ptid >>> +is zero. If it is not an error will be returned. This ensures that >>> +.I ptid >>> +can potentially be used to specify additional options for >>> +.B CLONE_PIDFD >>> +in the future. >> >> This piece is no longer true, right? At least I can't see such > > Correct. Thanks. Page amended. >> a check in the kernel code, and my testing doesn't yield an error >> when ptid != 0 before the call.(No need to send me a patch; if I'm >> correct just let me know and I'll edit out this piece.) >> >>> +.IP >>> +Since the >>> +.I ptid >>> +argument is used to return the pidfd, >>> +.B CLONE_PIDFD >>> +cannot be used with >>> +.B CLONE_PARENT_SETTID. >>> +.IP >>> +It is currently not possible to use this flag together with >>> +.B CLONE_THREAD. >>> +This means that the process identified by the pidfd will always be a >>> +thread-group leader. >>> +.IP >>> +For a while there was a >>> +.B CLONE_DETACHED >>> +flag. This flag is usually ignored when passed along with other flags. >>> +However, when passed alongside >>> +.B CLONE_PIDFD >>> +an error will be returned. This ensures that this flag can be reused >>> +for further pidfd features in the future. >>> +.TP >>> .BR CLONE_PTRACE " (since Linux 2.2)" >>> If >>> .B CLONE_PTRACE >>> @@ -1122,6 +1158,21 @@ For example, on aarch64, >>> .I child_stack >>> must be a multiple of 16. >>> .TP >>> +.B EINVAL >>> +.B CLONE_PIDFD >>> +was specified together with >>> +.B CLONE_DETACHED. >>> +.TP >>> +.B EINVAL >>> +.B CLONE_PIDFD >>> +was specified together with >>> +.B CLONE_PARENT_SETTID. >>> +.TP >>> +.B EINVAL >>> +.B CLONE_PIDFD >>> +was specified together with >>> +.B CLONE_THREAD. >>> +.TP >>> .B ENOMEM >>> Cannot allocate sufficient memory to allocate a task structure for the >>> child, or to copy those parts of the caller's context that need to be >> >> One other piece seems to be missing: the returned file descriptor can >> be fed to poll()/select()/epoll and the FD will test as readable when >> the child terminates. Right? Did that functionality also land in >> kernel 5.2? And did it get implemented as a separate commit, or did >> the behavior just fall naturally out of the implementation of pidfd's? >> Let me know the details, and I will craft a patch. > > It landed in 5.3. The relevant commit is: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b53b0b9d9a613c418057f6cb921c2f40a6f78c24 > and belongs to the following merge: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5450e8a316a64cddcbc15f90733ebc78aa736545 Thanks for that info. One other questions springs to mind. I haven't looked at the source or tried testing this, but can anything actually be read() from a PIDFD? Presumably, it might be useful to have data generated on the FD, since different values could (ultimately) be used to distinguish between terminate/stopp/continue transitions. >> Also, as far as I can see (from testing) the FD only gives pollable >> events on process termination, not on other process transitions such >> as stop and continue. Right? (Are there any plans to implement such > > Correct. > >> functionality for stop/contine transitions? > > Yes, at some point we will likely want this. > >> >> By the way, when do you expect the pidfd-wait functionality to land >> in the kernel? > > I've sent a PR for 5.4: > https://lkml.org/lkml/2019/9/10/682 > which contains the P_PIDFD extension to waitid(). Thanks for that pointer. I see that the code is now merged. > Thanks for the work, Michael! You're welcome! Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/