On Thu, Nov 14, 2024 at 11:26 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Thu, Nov 14, 2024 at 9:14 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song@xxxxxxxxxx> wrote: > > > > > > + > > > + if (bpf_is_subdir(dentry, v->dentry)) > > > + ret = FAN_FP_RET_SEND_TO_USERSPACE; > > > + else > > > + ret = FAN_FP_RET_SKIP_EVENT; > > > > It seems to me that all these patches and feature additions > > to fanotify, new kfuncs, etc are done just to do the above > > filtering by subdir ? > > > > If so, just hard code this logic as an extra flag to fanotify ? > > So it can filter all events by subdir. > > bpf programmability makes sense when it needs to express > > user space policy. Here it's just a filter by subdir. > > bpf hammer doesn't look like the right tool for this use case. > > Good question. > > Speaking as someone who has made several attempts to design > efficient subtree filtering in fanotify, it is not as easy as it sounds. > > I recently implemented a method that could be used for "practical" > subdir filtering in userspace, not before Jan has questioned if we > should go directly to subtree filtering with bpf [1]. > > This is not the only filter that was proposed for fanotify, where bpf > filter came as an alternative proposal [2], but subtree filtering is by far > the most wanted filter. Interesting :) Thanks for the links. Long threads back then. Looks like bpf filtering was proposed 2 and 4 years ago and Song's current attempt is the first real prototype that actually implements what was discussed for so long? Cool. I guess it has legs then. > The problem with implementing a naive is_subtree() filter in fanotify > is the unbounded cost to be paid by every user for every fs access > when M such filters are installed deep in the fs tree. Agree that the concern is real. I just don't see how filtering in bpf vs filtering in the kernel is going to be any different. is_subdir() is fast, so either kernel or bpf prog calling it on every file_open doesn't seem like a big deal. Are you saying that 'naive is_subdir' would call is_subdir M times for every such filter ? I guess it can be optimized. When these M filters are installed the kernel can check whether they're subdirs of each other and filter a subset of M filters with a single is_subdir() call. Same logic can be in either bpf prog or in the kernel. > Making this more efficient then becomes a matter of trading of > memory (inode/path cache size) and performance and depends > on the size and depth of the watched filesystem. > This engineering decision *is* the userspace policy that can be > expressed by a bpf program. > > As you may know, Linux is lagging behind Win and MacOS w.r.t > subtree filtering for fs events. > > MacOS/FreeBSD took the userspace approach with fseventsd [3]. > If you Google "fseventsd", you will get results with "High CPU and > Memory Usage" for as far as the browser can scroll. Yeah. Fun. Looking at your 2 year old attempt it seems the key part was to make sure inodes are not pinned for filtering purpose? How would you call is_subdir() then if parent dentry is not dget() ? Or meaning of 'pinning' is different? > I hope the bpf-aided early subtree filtering technology would be > able to reduce some of this overhead to facilitate a better engineering > solution, but that remains to be proven... Sure. Let's explore this further. > Thanks, > Amir. > > [1] https://lore.kernel.org/linux-fsdevel/20220228140556.ae5rhgqsyzm5djbp@xxxxxxxxxx/ > [2] https://lore.kernel.org/linux-fsdevel/20200828084603.GA7072@xxxxxxxxxxxxxx/ > [3] https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/TechnologyOverview/TechnologyOverview.html#//apple_ref/doc/uid/TP40005289-CH3-SW1