On Wed, Jul 21, 2021 at 8:21 AM Matthew Bobrowski <repnop@xxxxxxxxxx> wrote: > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which > allows userspace applications to control whether a pidfd info record > containing a pidfd is to be returned with each event. > > If FAN_REPORT_PIDFD is enabled for a notification group, an additional > struct fanotify_event_info_pidfd object will be supplied alongside the > generic struct fanotify_event_metadata within a single event. This > functionality is analogous to that of FAN_REPORT_FID in terms of how > the event structure is supplied to the userspace application. Usage of > FAN_REPORT_PIDFD with FAN_REPORT_FID/FAN_REPORT_DFID_NAME is > permitted, and in this case a struct fanotify_event_info_pidfd object > will follow any struct fanotify_event_info_fid object. > > Currently, the usage of FAN_REPORT_TID is not permitted along with > FAN_REPORT_PIDFD as the pidfd API only supports the creation of pidfds > for thread-group leaders. Additionally, the FAN_REPORT_PIDFD is > limited to privileged processes only i.e. listeners that are running > with the CAP_SYS_ADMIN capability. Attempting to supply either of > these initialization flags with FAN_REPORT_PIDFD will result with > EINVAL being returned to the caller. > > In the event of a pidfd creation error, there are two types of error > values that can be reported back to the listener. There is > FAN_NOPIDFD, which will be reported in cases where the process > responsible for generating the event has terminated prior to fanotify > being able to create pidfd for event->pid via pidfd_create(). The > there is FAN_EPIDFD, which will be reported if a more generic pidfd > creation error occurred when calling pidfd_create(). [...] > @@ -524,6 +562,34 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group, > } > metadata.fd = fd; > > + if (pidfd_mode) { > + /* > + * Complain if the FAN_REPORT_PIDFD and FAN_REPORT_TID mutual > + * exclusion is ever lifted. At the time of incoporating pidfd > + * support within fanotify, the pidfd API only supported the > + * creation of pidfds for thread-group leaders. > + */ > + WARN_ON_ONCE(FAN_GROUP_FLAG(group, FAN_REPORT_TID)); > + > + /* > + * The PIDTYPE_TGID check for an event->pid is performed > + * preemptively in attempt to catch those rare instances where > + * the process responsible for generating the event has > + * terminated prior to calling into pidfd_create() and acquiring > + * a valid pidfd. Report FAN_NOPIDFD to the listener in those > + * cases. All other pidfd creation errors are represented as > + * FAN_EPIDFD. > + */ > + if (metadata.pid == 0 || > + !pid_has_task(event->pid, PIDTYPE_TGID)) { > + pidfd = FAN_NOPIDFD; > + } else { > + pidfd = pidfd_create(event->pid, 0); > + if (pidfd < 0) > + pidfd = FAN_EPIDFD; > + } > + } > + As a general rule, f_op->read callbacks aren't allowed to mess with the file descriptor table of the calling process. A process should be able to receive a file descriptor from an untrusted source and call functions like read() on it without worrying about affecting its own file descriptor table state with that. I realize that existing fanotify code appears to be violating that rule already, and that you're limiting creation of fanotify file descriptors that can hit this codepath to CAP_SYS_ADMIN, but still, I think fanotify_read() probably ought to be an ioctl, or something along those lines, instead of an f_op->read handler if it messes with the caller's fd table?