On Tue, Nov 19, 2024 at 2:10 AM Song Liu <songliubraving@xxxxxxxx> wrote: > > Hi Alexei, > > > On Nov 18, 2024, at 4:10 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > [...] > >>> > >>> Agreed. This is actually something I have been thinking > >>> since the beginning of this work: Shall it be fanotify-bpf > >>> or fsnotify-bpf. Given we have more materials, this is a > >>> good time to have broader discussions on this. > >>> > >>> @all, please chime in whether we should redo this as > >>> fsnotify-bpf. AFAICT: > >>> > >>> Pros of fanotify-bpf: > >>> - There is existing user space that we can leverage/reuse. > >>> > >>> Pros of fsnotify-bpf: > >>> - Faster fast path. > >>> > >>> Another major pros/cons did I miss? Sorry, I forgot to address this earlier. First and foremost, I don't like the brand name "fast path" for this feature. The word "fast" implies that avoiding an indirect call is meaningful and I don't think this is the case here. I would rather rebrand this feature as "fanotify filter program". I am going to make a controversial argument: I anticipate that in the more common use of kernel filter for fanotify the filter will be *likely* to skip sending events to userspace, for example, when watching a small subtree in a large filesystem. In that case, the *likely* result is that a context switch to userspace is avoided, hence, the addition of one indirect call is insignificant in comparison. If we later decide that my assumption is wrong and it is important to optimize out the indirect call when skipping userspace, we could do that - it is not a problem, but for now, this discussion seems like premature optimization to me. > >> > >> Adding more thoughts on this: I think it makes more sense to > >> go with fanotify-bpf. This is because one of the benefits of > >> fsnotify/fanotify over LSM solutions is the built-in event > >> filtering of events. While this call chain is a bit long: > >> > >> fsnotify_open_perm->fsnotify->send_to_group->fanotify_handle_event. > >> > >> There are built-in filtering in fsnotify() and > >> send_to_group(), so logics in the call chain are useful. > > > > fsnotify_marks based filtering happens in fsnotify. > > No need to do more indirect calls to get to fanotify. > > > > I would add the bpf struct_ops hook right before send_to_group > > or inside of it. > > Not sure whether fsnotify_group concept should be reused > > or avoided. > > Per inode mark/mask filter should stay. > > We still need fsnotify_group. It matches each fanotify > file descriptor. > We need the filter to be per fsnotify_group. Unlike LSM, fsnotify/fanotify is a pubsub service for many fs event consumers (a.k.a groups). The feature is a filter per event consumer, not a global filter for all event consumers. > Moving struct_ops hook inside send_to_group does save > us an indirect call. But this also means we need to > introduce the fastpath concept to both fsnotify and > fanotify. I personally don't really like duplications > like this (see the big BUILD_BUG_ON array in > fanotify_handle_event). > > OTOH, maybe the benefit of one fewer indirect call > justifies the extra complexity. Let me think more > about it. > I need to explain something about fsnotify vs. fanotify in order to argue why the feature should be "fanotify", but the bottom line is that is should not be too hard to avoid the indirect call even if the feature is introduced through fanotify API as I think that it should be. TLDR: The fsnotify_backend abstraction has become somewhat of a theater of abstraction over time, because the feature distance between fanotify backend and all the rest has grew quite large. The logic in send_to_group() is *seemingly* the generic fsnotify logic, but not really, because only fanotify has ignore masks and only fanotify has mark types (inode,mount,sb). This difference is encoded by the group->ops->handle_event() operation that only fanotify implements. All the rest of the backends implement the simpler ->handle_inode_event(). Similarly, the group->private union is always dominated by the size of group->fanotify_data, so there is no big difference if we place group->fp_hook (or ->filter_hook) outside of fanotify_data, so that we can query and call it from send_to_group() saving the indirect call to ->handle_event(). That still leaves the question if we need to call fanotify_group_event_mask() before the filter hook. fanotify_group_event_mask() does several things, but I think that the only thing relevant before the filter hook is this line: /* * Send the event depending on event flags in mark mask. */ if (!fsnotify_mask_applicable(mark->mask, ondir, type)) continue; This code is related to the two "built-in fanotify filters", namely FAN_ONDIR and FAN_EVENT_ON_CHILD. These built-in filters are so lame that they serve as a good example why a programmable filter is a better idea. For example, users need to opt-in for events on directories, but they cannot request events only on directories. Historically, the "generic" abstraction in send_to_group() has dealt with the non-generic fanotify ignore mask, but has not dealt with these non-generic fanotify built-in filters. However, since commit 31a371e419c8 ("fanotify: prepare for setting event flags in ignore mask"), send_to_group() is already aware of those fanotify built-in filters. So unless I am missing something, if we align the marks iteration loop in send_to_group() to look exactly like the marks iteration loop in fanotify_group_event_mask(), there should be no problem to call the filter hook directly, right before calling ->handle_event(). Hope this brain dump helps. Thanks, Amir.