On Mon 28-02-22 19:40:07, Amir Goldstein wrote: > On Mon, Feb 28, 2022 at 4:06 PM Jan Kara <jack@xxxxxxx> wrote: > > > > Hi Amir! > > > > On Wed 23-02-22 20:42:37, Amir Goldstein wrote: > > > I wanted to get your feedback on an idea I have been playing with. > > > It started as a poor man's alternative to the old subtree watch problem. > > > For my employer's use case, we are watching the entire filesystem using > > > a filesystem mark, but would like to exclude events on a subtree > > > (i.e. all files underneath .private/). > > > > > > At the moment, those events are filtered in userspace. > > > I had considered adding directory marks with an ignored mask on every > > > event that is received for a directory path under .private/, but that has the > > > undesired side effect of pinning those directory inodes to cache. > > > > > > I have this old fsnotify-volatile branch [1] that I am using for an overlayfs > > > kernel internal fsnotify backend. I wonder what are your thoughts on > > > exposing this functionally to fanotify UAPI (i.e. FAN_MARK_VOLATILE). > > > > Interesting idea. I have some reservations wrt to the implementation (e.g. > > fsnotify_add_mark_list() convention of returning EEXIST when it updated > > mark's mask, or the fact that inode reclaim should now handle freeing of > > mark connector and attached marks - which may get interesting locking wise) > > but they are all fixable. > > Can you give me a hint as to how to implement the freeing of marks? OK, now I can see that fsnotify_inode_delete() gets called from __destroy_inode() and thus all marks should be freed even for inodes released by inode reclaim. Good. > > I'm wondering a bit whether this is really useful enough (and consequently > > whether we will not get another request to extend fanotify API in some > > other way to cater better to some other usecase related to subtree watches > > in the near future). I understand ignore marks are mainly a performance > > optimization and as such allowing inodes to be reclaimed (which means they > > are not used much and hence ignored mark is not very useful anyway) makes > > The problem is that we do not know in advance which of the many dirs in > the subtree are accessed often and which are accessed rarely (and that may > change over time), so volatile ignore marks are a way to set up ignore marks > on the most accessed dirs dynamically. Yes, I understand. > > sense. Thinking about this more, I guess it is useful to improve efficiency > > when you want to implement any userspace event-filtering scheme. > > > > The only remaining pending question I have is whether we should not go > > further and allow event filtering to happen using an eBPF program. That > > would be even more efficient (both in terms of memory and CPU). What do you > > think? > > > > I think that is an unrelated question. > > I do agree that we should NOT add "subtree filter" functionality to fanotify > (or any other filter) and that instead, we should add support for attaching an > eBPF program that implements is_subdir(). > I found this [1] convection with Tycho where you had suggested this idea. > I wonder if Tycho got to explore this path further? > > But I think that it is one thing to recommend users to implement their > filters as > eBPF programs and another thing to stand in the way of users that prefer to > implement userspace event filtering. It could be that the filter > cannot be easily > described by static rules to an eBPF program (e.g. need to query a database). > > In my POV, FAN_MARK_VOLATILE does not add any new logic/filtering rule. > It adds resource control by stating that the ignore mark is "best effort". > > Does it make sense? OK, makes sense. So I agree the functionality is worth it. Will you post the patches for review of technical details? Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR