On Wed 28-09-22 15:29:13, Amir Goldstein wrote: > On Mon, Sep 26, 2022 at 6:27 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Thu 22-09-22 16:03:41, Amir Goldstein wrote: > > > On Thu, Sep 22, 2022 at 1:48 PM Jan Kara <jack@xxxxxxx> wrote: > > > > On Tue 20-09-22 21:19:25, Amir Goldstein wrote: > > > > > For the next steps of POC, I could do: > > > > > - Report FAN_ACCESS_PERM range info to implement random read > > > > > patterns (e.g. unzip -l) > > > > > - Introduce FAN_MODIFY_PERM, so file content could be downloaded > > > > > before modifying a read-write HSM cache > > > > > - Demo conversion of a read-write FUSE HSM implementation > > > > > (e.g. https://github.com/volga629/davfs2) > > > > > - Demo HSM with filesystem mark [*] and a hardcoded test filter > > > > > > > > > > [*] Note that unlike the case with recursive inotify, this POC HSM > > > > > implementation is not racy, because of the lookup permission events. > > > > > A filesystem mark is still needed to avoid pinning all the unpopulated > > > > > cache tree leaf entries to inode cache, so that this HSM could work on > > > > > a very large scale tree, the same as my original use case for implementing > > > > > filesystem mark. > > > > > > > > Sounds good! Just with your concern about pinning - can't you use evictable > > > > marks added on lookup for files / dirs you want to track? Maybe it isn't > > > > great design for other reasons but it would save you some event > > > > filtering... > > > > > > > > > > With the current POC, there is no trigger to re-establish the evicted mark, > > > because the parent is already populated and has no mark. > > > > So my original thinking was that you'd place FAN_LOOKUP_PERM mark on top of > > the directory tree and then you'd add evictable marks to all the subdirs > > that are looked up from the FAN_LOOKUP_PERM event handler. That way I'd > > imagine you can place evictable marks on all directories that are used in a > > race-free manner. > > > > Maybe I am missing something. > I don't see how that can scale up to provide penalty free fast path lookup > for fully populated subtrees. No, you are right that this scheme would have non-trivial overhead in processing the lookup events even when the tree is fully populated. > > > A hook on instantiate of inode in inode cache could fill that gap. > > > It could still be useful to filter FAN_INSTANTIATE_PERM events in the > > > kernel but it is not a must because instantiate is more rare than (say) lookup > > > and then the fast lookup path (RCU walk) on populated trees suffers almost > > > no overhead when the filesystem is watched. > > > > > > Please think about this and let me know if you think that this is a direction > > > worth pursuing, now, or as a later optimization. > > > > I think an event on instantiate seems to be depending too much on kernel > > internals instead of obvious filesystem operations. Also it might be a bit > > challenging during startup when you don't know what is cached and what not > > so you cannot rely on instantiate events for placing marks. So I'd leave > > this for future optimization. > > > > Perhaps a user FAN_INSTANTIATE_PERM is too much, but I was > trying to figure out a way to make automatic inode marks work. > If we can define reasonable use cases for automatic inode marks that > kernel can implement (e.g. inheriting from parent on dentry instantiate) > then this could possibly get us something useful. > Maybe that is what you meant with the suggestion above? Well, my suggestion was pondering if we can implement something like automatic inode marks in userspace using FAN_LOOKUP_PERM event. But you are right the overhead in the fast path does not make this very attractive. So we'll have to look more into the in-kernel solution. > The other use case of automatic inode marks I was thinking about, > which are even more relevant for $SUBJECT is this: > When instantiating a dentry with an inode that has xattr > "security.fanotify.mask" (a.k.a. persistent inode mark), an inode > mark could be auto created and attached to a group with a special sb > mark (we can limit a single special mark per sb). > This could be implemented similar to get_acl(), where i_fsnotify_mask > is always initialized with a special value (i.e. FS_UNINITIALIZED) > which is set to either 0 or non-zero if "security.fanotify.mask" exists. > > The details of how such an API would look like are very unclear to me, > so I will try to focus on the recursive auto inode mark idea. Yeah, although initializing fanotify marks based on xattrs does not look completely crazy I can see a lot of open questions there so I think automatic inode mark idea has more chances for success at this point :). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR