On Tue, Mar 1, 2022 at 1:07 PM Jan Kara <jack@xxxxxxx> wrote: > > On Mon 28-02-22 19:40:07, Amir Goldstein wrote: > > On Mon, Feb 28, 2022 at 4:06 PM Jan Kara <jack@xxxxxxx> wrote: > > > > > > Hi Amir! > > > > > > On Wed 23-02-22 20:42:37, Amir Goldstein wrote: > > > > I wanted to get your feedback on an idea I have been playing with. > > > > It started as a poor man's alternative to the old subtree watch problem. > > > > For my employer's use case, we are watching the entire filesystem using > > > > a filesystem mark, but would like to exclude events on a subtree > > > > (i.e. all files underneath .private/). > > > > > > > > At the moment, those events are filtered in userspace. > > > > I had considered adding directory marks with an ignored mask on every > > > > event that is received for a directory path under .private/, but that has the > > > > undesired side effect of pinning those directory inodes to cache. > > > > > > > > I have this old fsnotify-volatile branch [1] that I am using for an overlayfs > > > > kernel internal fsnotify backend. I wonder what are your thoughts on > > > > exposing this functionally to fanotify UAPI (i.e. FAN_MARK_VOLATILE). > > > > > > Interesting idea. I have some reservations wrt to the implementation (e.g. > > > fsnotify_add_mark_list() convention of returning EEXIST when it updated > > > mark's mask, or the fact that inode reclaim should now handle freeing of > > > mark connector and attached marks - which may get interesting locking wise) > > > but they are all fixable. > > > > Can you give me a hint as to how to implement the freeing of marks? > > OK, now I can see that fsnotify_inode_delete() gets called from > __destroy_inode() and thus all marks should be freed even for inodes > released by inode reclaim. Good. > > > > I'm wondering a bit whether this is really useful enough (and consequently > > > whether we will not get another request to extend fanotify API in some > > > other way to cater better to some other usecase related to subtree watches > > > in the near future). I understand ignore marks are mainly a performance > > > optimization and as such allowing inodes to be reclaimed (which means they > > > are not used much and hence ignored mark is not very useful anyway) makes > > > > The problem is that we do not know in advance which of the many dirs in > > the subtree are accessed often and which are accessed rarely (and that may > > change over time), so volatile ignore marks are a way to set up ignore marks > > on the most accessed dirs dynamically. > > Yes, I understand. > > > > sense. Thinking about this more, I guess it is useful to improve efficiency > > > when you want to implement any userspace event-filtering scheme. > > > > > > The only remaining pending question I have is whether we should not go > > > further and allow event filtering to happen using an eBPF program. That > > > would be even more efficient (both in terms of memory and CPU). What do you > > > think? > > > > > > > I think that is an unrelated question. > > > > I do agree that we should NOT add "subtree filter" functionality to fanotify > > (or any other filter) and that instead, we should add support for attaching an > > eBPF program that implements is_subdir(). > > I found this [1] convection with Tycho where you had suggested this idea. > > I wonder if Tycho got to explore this path further? > > > > But I think that it is one thing to recommend users to implement their > > filters as > > eBPF programs and another thing to stand in the way of users that prefer to > > implement userspace event filtering. It could be that the filter > > cannot be easily > > described by static rules to an eBPF program (e.g. need to query a database). > > > > In my POV, FAN_MARK_VOLATILE does not add any new logic/filtering rule. > > It adds resource control by stating that the ignore mark is "best effort". > > > > Does it make sense? > > OK, makes sense. So I agree the functionality is worth it. Will you post > the patches for review of technical details? Of course, I need to add a patch for UAPI change and write a test. I just wanted to get a tentative ACK before I put in more effort. I will address your comment about -EEXIST return value. Thanks, Amir.