Re: [RFC bpf-next fanotify 5/5] selftests/bpf: Add test for BPF based fanotify fastpath handler

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 7 Nov 2024 21:33:56 +0100

On Thu, Nov 7, 2024 at 8:53 PM Song Liu <songliubraving@xxxxxxxx> wrote:
>
>
>
> > On Nov 7, 2024, at 3:10 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Wed, Oct 30, 2024 at 12:13 AM Song Liu <song@xxxxxxxxxx> wrote:
> >>
> >> This test shows a simplified logic that monitors a subtree. This is
> >> simplified as it doesn't handle all the scenarios, such as:
> >>
> >>  1) moving a subsubtree into/outof the being monitoring subtree;
> >
> > There is a solution for that (see below)
> >
> >>  2) mount point inside the being monitored subtree
> >
> > For that we will need to add the MOUNT/UNMOUNT/MOVE_MOUNT events,
> > but those have been requested by userspace anyway.
> >
> >>
> >> Therefore, this is not to show a way to reliably monitor a subtree.
> >> Instead, this is to test the functionalities of bpf based fastpath.
> >> To really monitor a subtree reliably, we will need more complex logic.
> >
> > Actually, this example is the foundation of my vision for efficient and race
> > free subtree filtering:
> >
> > 1. The inode map is to be treated as a cache for the is_subdir() query
>
> Using is_subdir() as the truth and managing the cache in inode map seems
> promising to me.
>
> > 2. Cache entries can also have a "distance from root" (i.e. depth) value
> > 3. Each unknown queried path can call is_subdir() and populate the cache
> >    entries for all ancestors
> > 4. The cache/map size should be limited and when limit is reached,
> >    evicting entries by depth priority makes sense
> > 5. A rename event for a directory whose inode is in the map and whose
> >   new parent is not in the map or has a different value than old parent
> >   needs to invalidate the entire map
> > 6. fast_path also needs a hook from inode evict to clear cache entries
>
> The inode map is physically attached to the inode itself. So the evict
> event is automatically handled. IOW, an inode's entry in the inode map
> is automatically removed when the inode is freed. For the same reason,
> we don't need to set a limit in map size and add evicting logic. Of
> course, this works based on the assumption that we don't use too much
> memory for each inode. I think this assumption is true.

Oh no, it is definitely wrong each inode is around 1K and this was the
main incentive to implement FAN_MARK_EVICTABLE and
FAN_MARK_FILESYSTEK in the first place, because recursive
pinning of all inodes in a large tree is not scalable to large trees.

Thanks,
Amir.