Re: [PATCH 0/5] Add pidfd support to the fanotify API

Matthew Bobrowski <repnop@xxxxxxxxxx> · Sat, 22 May 2021 09:32:36 +1000

On Fri, May 21, 2021 at 12:40:56PM +0200, Jan Kara wrote:
> On Fri 21-05-21 20:15:35, Matthew Bobrowski wrote:
> > On Thu, May 20, 2021 at 03:55:27PM +0200, Jan Kara wrote:
> > There's one thing that I'd like to mention, and it's something in
> > regards to the overall approach we've taken that I'm not particularly
> > happy about and I'd like to hear all your thoughts. Basically, with
> > this approach the pidfd creation is done only once an event has been
> > queued and the notification worker wakes up and picks up the event
> > from the queue processes it. There's a subtle latency introduced when
> > taking such an approach which at times leads to pidfd creation
> > failures. As in, by the time pidfd_create() is called the struct pid
> > has already been reaped, which then results in FAN_NOPIDFD being
> > returned in the pidfd info record.
> > 
> > Having said that, I'm wondering what the thoughts are on doing pidfd
> > creation earlier on i.e. in the event allocation stages? This way, the
> > struct pid is pinned earlier on and rather than FAN_NOPIDFD being
> > returned in the pidfd info record because the struct pid has been
> > already reaped, userspace application will atleast receive a valid
> > pidfd which can be used to check whether the process still exists or
> > not. I think it'll just set the expectation better from an API
> > perspective.
> 
> Yes, there's this race. OTOH if FAN_NOPIDFD is returned, the listener can
> be sure the original process doesn't exist anymore. So is it useful to
> still receive pidfd of the dead process?

Well, you're absolutely right. However, FWIW I was approaching this
from two different angles:

1) I wanted to keep the pattern in which the listener checks for the
   existence/recycling of the process consistent. As in, the listener
   would receive the pidfd, then send the pidfd a signal via
   pidfd_send_signal() and check for -ESRCH which clearly indicates
   that the target process has terminated.

2) I didn't want to mask failed pidfd creation because of early
   process termination and other possible failures behind a single
   FAN_NOPIDFD. IOW, if we take the -ESRCH approach above, the
   listener can take clear corrective branches as what's to be done
   next if a race is to have been detected, whereas simply returning
   FAN_NOPIDFD at this stage can mean multiple things.

Now that I've written the above and keeping in mind that we'd like to
refrain from doing anything in the event allocation stages, perhaps we
could introduce a different error code for detecting early process
termination while attempting to construct the info record. WDYT?

> Also opening pidfd in the context of event generation is problematic
> for two reasons:
>
> 1) Technically, the context under which events are generated can be rather
> constrained (various locks held etc.). Adding relatively complex operations
> such as pidfd creation is going to introduce strange lock dependencies,
> possibly deadlocks.
> 
> 2) Doing pidfd generation in the context of the process generating event is
> problematic - you don't know in which fd_table the fd will live. Also that
> process is unfairly penalized (performance wise) because someone else is
> listening. We try to keep overhead of event generation as low as possible
> for this reason.

Fair points, thanks for sharing your view. :)

/M