On Wed, Sep 18, 2019 at 08:49:59AM +0200, Michael Kerrisk (man-pages) wrote: > Hello Christian, > > On 9/16/19 9:40 AM, Christian Brauner wrote: > > On Wed, Sep 11, 2019 at 10:58:57AM +0200, Michael Kerrisk (man-pages) wrote: > >> Hello Christian, > >> > >> On 5/11/19 8:49 AM, Christian Brauner wrote: > >>> From: Christian Brauner <christian@xxxxxxxxxx> > >>> > >>> Add an entry for CLONE_PIDFD. This flag is available starting with > >>> kernel 5.2. If specified, a process file descriptor ("pidfd") referring > >>> to the child process will be returned in the ptid argument. > >> > >> I've applied this patch in a local branch, and made some minor edits > > > > Thank you! :) > > > >> and added a piece. And I have some questions. See below. > >> > >>> Signed-off-by: Christian Brauner <christian@xxxxxxxxxx> > >>> --- > > [...] > > >>> Note, that the kernel verifies that the value for > >>> +.I ptid > >>> +is zero. If it is not an error will be returned. This ensures that > >>> +.I ptid > >>> +can potentially be used to specify additional options for > >>> +.B CLONE_PIDFD > >>> +in the future. > >> > >> This piece is no longer true, right? At least I can't see such > > > > Correct. > > Thanks. Page amended. > > >> a check in the kernel code, and my testing doesn't yield an error > >> when ptid != 0 before the call.(No need to send me a patch; if I'm > >> correct just let me know and I'll edit out this piece.) > >> > >>> +.IP > >>> +Since the > >>> +.I ptid > >>> +argument is used to return the pidfd, > >>> +.B CLONE_PIDFD > >>> +cannot be used with > >>> +.B CLONE_PARENT_SETTID. > >>> +.IP > >>> +It is currently not possible to use this flag together with > >>> +.B CLONE_THREAD. > >>> +This means that the process identified by the pidfd will always be a > >>> +thread-group leader. > >>> +.IP > >>> +For a while there was a > >>> +.B CLONE_DETACHED > >>> +flag. This flag is usually ignored when passed along with other flags. > >>> +However, when passed alongside > >>> +.B CLONE_PIDFD > >>> +an error will be returned. This ensures that this flag can be reused > >>> +for further pidfd features in the future. > >>> +.TP > >>> .BR CLONE_PTRACE " (since Linux 2.2)" > >>> If > >>> .B CLONE_PTRACE > >>> @@ -1122,6 +1158,21 @@ For example, on aarch64, > >>> .I child_stack > >>> must be a multiple of 16. > >>> .TP > >>> +.B EINVAL > >>> +.B CLONE_PIDFD > >>> +was specified together with > >>> +.B CLONE_DETACHED. > >>> +.TP > >>> +.B EINVAL > >>> +.B CLONE_PIDFD > >>> +was specified together with > >>> +.B CLONE_PARENT_SETTID. > >>> +.TP > >>> +.B EINVAL > >>> +.B CLONE_PIDFD > >>> +was specified together with > >>> +.B CLONE_THREAD. > >>> +.TP > >>> .B ENOMEM > >>> Cannot allocate sufficient memory to allocate a task structure for the > >>> child, or to copy those parts of the caller's context that need to be > >> > >> One other piece seems to be missing: the returned file descriptor can > >> be fed to poll()/select()/epoll and the FD will test as readable when > >> the child terminates. Right? Did that functionality also land in > >> kernel 5.2? And did it get implemented as a separate commit, or did > >> the behavior just fall naturally out of the implementation of pidfd's? > >> Let me know the details, and I will craft a patch. > > > > It landed in 5.3. The relevant commit is: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b53b0b9d9a613c418057f6cb921c2f40a6f78c24 > > and belongs to the following merge: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5450e8a316a64cddcbc15f90733ebc78aa736545 > > Thanks for that info. One other questions springs to mind. > I haven't looked at the source or tried testing this, > but can anything actually be read() from a PIDFD? Presumably, We had discussed this but decided to not implement this right away. Mainly, because we did not have a clear picture what the semantics should be. But it is something that we will probably want in the future... > it might be useful to have data generated on the FD, since > different values could (ultimately) be used to distinguish > between terminate/stopp/continue transitions. Yes. Thanks! Christian