Re: [PATCH 0/5] Add pidfd support to the fanotify API

Matthew Bobrowski <repnop@xxxxxxxxxx> · Thu, 3 Jun 2021 11:24:08 +1000

On Wed, Jun 02, 2021 at 03:18:03PM +0300, Amir Goldstein wrote:
> > > I still don't understand what's racy about this. Won't the event reader
> > > get a valid pidfd?
> >
> > I guess this depends, right?
> >
> > As the logic/implementation currently stands in this specific patch series,
> > pidfd_create() will _NOT_ return a valid pidfd unless the struct pid still
> > holds reference to a task of type PIDTYPE_TGID. This is thanks to the extra
> > pid_hash_task() check that I thought was appropriate to incorporate into
> > pidfd_create() seeing as though:
> >
> >  - With the pidfd_create() declaration now being added to linux/pid.h, we
> >    effectively are giving the implicit OK for it to be called from other
> >    kernel subsystems, and hence why the caller should be subject to the
> >    same restrictions/verifications imposed by the API specification
> >    i.e. "Currently, the process identified by @pid must be a thread-group
> >    leader...". Not enforcing the pid_has_task() check in pidfd_create()
> >    effectively says that the pidfd creation can be done for any struct pid
> >    types i.e. PIDTYPE_PID, PIDTYPE_PGID, etc. This leads to assumptions
> >    being made by the callers, which effectively then could lead to
> >    undefined/unexpected behavior.
> >
> > There definitely can be cases whereby the underlying task(s) associated
> > with a struct pid have been freed as a result of process being killed
> > early. As in, the process is killed before the pid_has_task() check is
> > performed from within pidfd_create() when called from fanotify. This is
> > precisely the race that I'm referring to here, and in such cases as the
> > code currently stands, the event listener will _NOT_ receive a valid pidfd.
> >
> > > Can't the event reader verify that the pidfd points to a dead process?
> >
> > This was the idea, as in, the burden of checking whether a process has been
> > killed before the event listener receives the event should be on the event
> > listener. However, we're trying to come up with a good way to effectively
> > communicate that the above race I've attempted to articulate has actually
> > occurred. As in, do we:
> >
> > a) Drop the pid_has_task() check in pidfd_create() so that a pidfd can be
> >    returned for all passed struct pids? That way, even if the above race is
> >    experienced the caller will still receive a pidfd and the event listener
> >    can do whatever it needs to with it.
> >
> > b) Before calling into pidfd_create(), perform an explicit pid_has_task()
> >    check for PIDTYPE_TGID and if that returns false, then set FAN_NOPIDFD
> >    and save ourselves from calling into pidfd_create() all together. The
> >    event listener in this case doesn't have to perform the signal check to
> >    determine whether the process has already been killed.
> >
> > c) Scrap calling into pidfd_create() all together and write a simple
> >    fanotify wrapper that contains the pidfd creation logic we need.
> >
> > > I don't mind returning FAN_NOPIDFD for convenience, but user
> > > will have to check the pidfd that it got anyway, because process
> > > can die at any time between reading the event and acting on the
> > > pidfd.
> >
> > Well sort of, as it depends on the approach that we decide to go ahead with
> > to report such early process termination cases. If we return FAN_NOPIDFD,
> > which signifies that the process died before a pidfd could be created, then
> > there's no point for the listener to step into checking the pidfd because
> > the pidfd already == FAN_NOPIDFD. If we simply return a pidfd regardless of
> > the early termination of the process, then sure the event listener will
> > need to check each pidfd that is supplied.
> >
> 
> I don't see any problem with the fact that the listener would have to check the
> reported pidfd. I don't see how or why that should be avoided.
> If we already know there is no process, we can be nice and return NOPIDFD,
> because that doesn't add any complexity.

Yes, agree. Going ahead with returning FAN_NOPIDFD for early process
termination i.e. before fanotify calls into pidfd creation code. All other
pidfd creation errors will be reported as FAN_EPIDFD.

> Not to mention that if pid_vnr() returns 0 (process is outside of
> pidns of caller)
> then I think you MUST either report FAN_NOPIDFD or no pid info record,
> because getting 0 event->pid  with a valid pidfd is weird IMO and could be
> considered as a security breach.

Good point. I feel as though reporting FAN_NOPIDFD in such cases would be
more appropriate, rather than leaving off a pidfd info record completely.

> > > > because we perform the pidfd creation during the notification queue
> > > > processing and not in the event allocation stages (for reasons that Jan has
> > > > already covered here [1]). So, tl;dr there is the case where the fanotify
> > > > calls pidfd_create() and the check for pid_has_task() fails because the
> > > > struct pid that we're hanging onto within an event no longer contains a
> > > > task of type PIDTYPE_TGID...
> > > >
> > > > [0] https://www.spinics.net/lists/linux-api/msg48630.html
> > > > [1] https://www.spinics.net/lists/linux-api/msg48632.html
> >
> > Maybe I'm going down a rabbit hole and overthinking this whole thing,
> > IDK... :(
> >
> 
> That is the feeling I get as well.
> Suggestion: write the man page - that will make it clear to yourself
> and to code reviewers if the API is sane and if it is going to end up
> being confusing to end users.

Yeah, that's a good approach. I'll consider doing this for future API
changes.

/M