On Mon, Feb 28, 2022 at 4:06 PM Jan Kara <jack@xxxxxxx> wrote: > > Hi Amir! > > On Wed 23-02-22 20:42:37, Amir Goldstein wrote: > > I wanted to get your feedback on an idea I have been playing with. > > It started as a poor man's alternative to the old subtree watch problem. > > For my employer's use case, we are watching the entire filesystem using > > a filesystem mark, but would like to exclude events on a subtree > > (i.e. all files underneath .private/). > > > > At the moment, those events are filtered in userspace. > > I had considered adding directory marks with an ignored mask on every > > event that is received for a directory path under .private/, but that has the > > undesired side effect of pinning those directory inodes to cache. > > > > I have this old fsnotify-volatile branch [1] that I am using for an overlayfs > > kernel internal fsnotify backend. I wonder what are your thoughts on > > exposing this functionally to fanotify UAPI (i.e. FAN_MARK_VOLATILE). > > Interesting idea. I have some reservations wrt to the implementation (e.g. > fsnotify_add_mark_list() convention of returning EEXIST when it updated > mark's mask, or the fact that inode reclaim should now handle freeing of > mark connector and attached marks - which may get interesting locking wise) > but they are all fixable. Can you give me a hint as to how to implement the freeing of marks? > > I'm wondering a bit whether this is really useful enough (and consequently > whether we will not get another request to extend fanotify API in some > other way to cater better to some other usecase related to subtree watches > in the near future). I understand ignore marks are mainly a performance > optimization and as such allowing inodes to be reclaimed (which means they > are not used much and hence ignored mark is not very useful anyway) makes The problem is that we do not know in advance which of the many dirs in the subtree are accessed often and which are accessed rarely (and that may change over time), so volatile ignore marks are a way to set up ignore marks on the most accessed dirs dynamically. > sense. Thinking about this more, I guess it is useful to improve efficiency > when you want to implement any userspace event-filtering scheme. > > The only remaining pending question I have is whether we should not go > further and allow event filtering to happen using an eBPF program. That > would be even more efficient (both in terms of memory and CPU). What do you > think? > I think that is an unrelated question. I do agree that we should NOT add "subtree filter" functionality to fanotify (or any other filter) and that instead, we should add support for attaching an eBPF program that implements is_subdir(). I found this [1] convection with Tycho where you had suggested this idea. I wonder if Tycho got to explore this path further? But I think that it is one thing to recommend users to implement their filters as eBPF programs and another thing to stand in the way of users that prefer to implement userspace event filtering. It could be that the filter cannot be easily described by static rules to an eBPF program (e.g. need to query a database). In my POV, FAN_MARK_VOLATILE does not add any new logic/filtering rule. It adds resource control by stating that the ignore mark is "best effort". Does it make sense? Thanks, Amir. [1] https://lore.kernel.org/linux-fsdevel/20200828084603.GA7072@xxxxxxxxxxxxxx/