On Wed, Jul 21, 2021 at 9:19 AM Matthew Bobrowski <repnop@xxxxxxxxxx> wrote: > > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which > allows userspace applications to control whether a pidfd info record > containing a pidfd is to be returned with each event. > > If FAN_REPORT_PIDFD is enabled for a notification group, an additional > struct fanotify_event_info_pidfd object will be supplied alongside the > generic struct fanotify_event_metadata within a single event. This > functionality is analogous to that of FAN_REPORT_FID in terms of how > the event structure is supplied to the userspace application. Usage of > FAN_REPORT_PIDFD with FAN_REPORT_FID/FAN_REPORT_DFID_NAME is > permitted, and in this case a struct fanotify_event_info_pidfd object > will follow any struct fanotify_event_info_fid object. > > Currently, the usage of FAN_REPORT_TID is not permitted along with > FAN_REPORT_PIDFD as the pidfd API only supports the creation of pidfds > for thread-group leaders. Additionally, the FAN_REPORT_PIDFD is > limited to privileged processes only i.e. listeners that are running > with the CAP_SYS_ADMIN capability. Attempting to supply either of > these initialization flags with FAN_REPORT_PIDFD will result with > EINVAL being returned to the caller. > > In the event of a pidfd creation error, there are two types of error > values that can be reported back to the listener. There is > FAN_NOPIDFD, which will be reported in cases where the process > responsible for generating the event has terminated prior to fanotify > being able to create pidfd for event->pid via pidfd_create(). The I think that "...prior to event listener reading the event..." is a more accurate description of the situation. > there is FAN_EPIDFD, which will be reported if a more generic pidfd > creation error occurred when calling pidfd_create(). > > Signed-off-by: Matthew Bobrowski <repnop@xxxxxxxxxx> > --- > > Changes since v2: > > * The FAN_REPORT_PIDFD flag value has been changed from 0x00001000 to > 0x00000080. This was so that future FID related initialization flags > could be grouped nicely. > > * Fixed pidfd clean up at out_close_fd label in > copy_event_to_user(). Reversed the conditional and it now uses the > close_fd() helper instead of put_unused_fd() as we also need to close the > backing file, not just just mark the pidfd free in the fdtable. > > * Shuffled around the WARN_ON_ONCE(FAN_REPORT_TID) within > copy_event_to_user() so that it's inside the if (pidfd_mode) branch. It > makes more sense to be as close to pidfd creation as possible. > > * Fixed up the comment block within the if (pidfd_mode) branch. > > fs/notify/fanotify/fanotify_user.c | 88 ++++++++++++++++++++++++++++-- > include/linux/fanotify.h | 3 +- > include/uapi/linux/fanotify.h | 13 +++++ > 3 files changed, 98 insertions(+), 6 deletions(-) > [...] > > @@ -489,8 +526,9 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, > struct path *path = fanotify_event_path(event); > struct fanotify_info *info = fanotify_event_info(event); > unsigned int info_mode = FAN_GROUP_FLAG(group, FANOTIFY_INFO_MODES); > + unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; > struct file *f = NULL; > - int ret, fd = FAN_NOFD; > + int ret, pidfd = FAN_NOPIDFD, fd = FAN_NOFD; > > pr_debug("%s: group=%p event=%p\n", __func__, group, event); > > @@ -524,6 +562,34 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, > } > metadata.fd = fd; > > + if (pidfd_mode) { > + /* > + * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual > + * exclusion is ever lifted. At the time of incoporating pidfd > + * support within fanotify, the pidfd API only supported the > + * creation of pidfds for thread-group leaders. > + */ > + WARN_ON_ONCE(FAN_GROUP_FLAG(group, FAN_REPORT_TID)); > + > + /* > + * The PIDTYPE_TGID check for an event->pid is performed > + * preemptively in attempt to catch those rare instances where > + * the process responsible for generating the event has > + * terminated prior to calling into pidfd_create() and acquiring I find the description above to be "over dramatic". An event listener reading events after generating process has terminated could be quite common in case of one shot tools like mv,touch,etc. Thanks, Amir.