On Wed, Jun 02, 2021 at 03:18:03PM +0300, Amir Goldstein wrote: > > > I still don't understand what's racy about this. Won't the event reader > > > get a valid pidfd? > > > > I guess this depends, right? > > > > As the logic/implementation currently stands in this specific patch series, > > pidfd_create() will _NOT_ return a valid pidfd unless the struct pid still > > holds reference to a task of type PIDTYPE_TGID. This is thanks to the extra > > pid_hash_task() check that I thought was appropriate to incorporate into > > pidfd_create() seeing as though: > > > > - With the pidfd_create() declaration now being added to linux/pid.h, we > > effectively are giving the implicit OK for it to be called from other > > kernel subsystems, and hence why the caller should be subject to the > > same restrictions/verifications imposed by the API specification > > i.e. "Currently, the process identified by @pid must be a thread-group > > leader...". Not enforcing the pid_has_task() check in pidfd_create() > > effectively says that the pidfd creation can be done for any struct pid > > types i.e. PIDTYPE_PID, PIDTYPE_PGID, etc. This leads to assumptions > > being made by the callers, which effectively then could lead to > > undefined/unexpected behavior. > > > > There definitely can be cases whereby the underlying task(s) associated > > with a struct pid have been freed as a result of process being killed > > early. As in, the process is killed before the pid_has_task() check is > > performed from within pidfd_create() when called from fanotify. This is > > precisely the race that I'm referring to here, and in such cases as the > > code currently stands, the event listener will _NOT_ receive a valid pidfd. > > > > > Can't the event reader verify that the pidfd points to a dead process? > > > > This was the idea, as in, the burden of checking whether a process has been > > killed before the event listener receives the event should be on the event > > listener. However, we're trying to come up with a good way to effectively > > communicate that the above race I've attempted to articulate has actually > > occurred. As in, do we: > > > > a) Drop the pid_has_task() check in pidfd_create() so that a pidfd can be > > returned for all passed struct pids? That way, even if the above race is > > experienced the caller will still receive a pidfd and the event listener > > can do whatever it needs to with it. > > > > b) Before calling into pidfd_create(), perform an explicit pid_has_task() > > check for PIDTYPE_TGID and if that returns false, then set FAN_NOPIDFD > > and save ourselves from calling into pidfd_create() all together. The > > event listener in this case doesn't have to perform the signal check to > > determine whether the process has already been killed. > > > > c) Scrap calling into pidfd_create() all together and write a simple > > fanotify wrapper that contains the pidfd creation logic we need. > > > > > I don't mind returning FAN_NOPIDFD for convenience, but user > > > will have to check the pidfd that it got anyway, because process > > > can die at any time between reading the event and acting on the > > > pidfd. > > > > Well sort of, as it depends on the approach that we decide to go ahead with > > to report such early process termination cases. If we return FAN_NOPIDFD, > > which signifies that the process died before a pidfd could be created, then > > there's no point for the listener to step into checking the pidfd because > > the pidfd already == FAN_NOPIDFD. If we simply return a pidfd regardless of > > the early termination of the process, then sure the event listener will > > need to check each pidfd that is supplied. > > > > I don't see any problem with the fact that the listener would have to check the > reported pidfd. I don't see how or why that should be avoided. > If we already know there is no process, we can be nice and return NOPIDFD, > because that doesn't add any complexity. Yes, agree. Going ahead with returning FAN_NOPIDFD for early process termination i.e. before fanotify calls into pidfd creation code. All other pidfd creation errors will be reported as FAN_EPIDFD. > Not to mention that if pid_vnr() returns 0 (process is outside of > pidns of caller) > then I think you MUST either report FAN_NOPIDFD or no pid info record, > because getting 0 event->pid with a valid pidfd is weird IMO and could be > considered as a security breach. Good point. I feel as though reporting FAN_NOPIDFD in such cases would be more appropriate, rather than leaving off a pidfd info record completely. > > > > because we perform the pidfd creation during the notification queue > > > > processing and not in the event allocation stages (for reasons that Jan has > > > > already covered here [1]). So, tl;dr there is the case where the fanotify > > > > calls pidfd_create() and the check for pid_has_task() fails because the > > > > struct pid that we're hanging onto within an event no longer contains a > > > > task of type PIDTYPE_TGID... > > > > > > > > [0] https://www.spinics.net/lists/linux-api/msg48630.html > > > > [1] https://www.spinics.net/lists/linux-api/msg48632.html > > > > Maybe I'm going down a rabbit hole and overthinking this whole thing, > > IDK... :( > > > > That is the feeling I get as well. > Suggestion: write the man page - that will make it clear to yourself > and to code reviewers if the API is sane and if it is going to end up > being confusing to end users. Yeah, that's a good approach. I'll consider doing this for future API changes. /M