On Tue, Oct 08, 2024 at 01:23:44PM GMT, Jan Kara wrote: > On Tue 08-10-24 10:57:22, Amir Goldstein wrote: > > On Tue, Oct 8, 2024 at 1:33 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > On Mon, Oct 07, 2024 at 01:37:19PM -0700, Linus Torvalds wrote: > > > > On Thu, 3 Oct 2024 at 04:57, Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > > > Fair enough. If we go with the iterator variant I've suggested to Dave in > > > > > [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and > > > > > Landlocks hook_sb_delete() into a single iteration relatively easily. But > > > > > I'd wait with that convertion until this series lands. > > > > > > > > Honza, I looked at this a bit more, particularly with an eye of "what > > > > happens if we just end up making the inode lifetimes subject to the > > > > dentry lifetimes" as suggested by Dave elsewhere. > > > > > > .... > > > > > > > which makes the fsnotify_inode_delete() happen when the inode is > > > > removed from the dentry. > > > > > > There may be other inode references being held that make > > > the inode live longer than the dentry cache. When should the > > > fsnotify marks be removed from the inode in that case? Do they need > > > to remain until, e.g, writeback completes? > > > > > > > fsnotify inode marks remain until explicitly removed or until sb > > is unmounted (*), so other inode references are irrelevant to > > inode mark removal. > > > > (*) fanotify has "evictable" inode marks, which do not hold inode > > reference and go away on inode evict, but those mark evictions > > do not generate any event (i.e. there is no FAN_UNMOUNT). > > Yes. Amir beat me with the response so let me just add that FS_UMOUNT event > is for inotify which guarantees that either you get an event about somebody > unlinking the inode (e.g. IN_DELETE_SELF) or event about filesystem being > unmounted (IN_UMOUNT) if you place mark on some inode. I also don't see how > we would maintain this behavior with what Linus proposes. > > > > > Then at umount time, the dentry shrinking will deal with all live > > > > dentries, and at most the fsnotify layer would send the FS_UNMOUNT to > > > > just the root dentry inodes? > > > > > > I don't think even that is necessary, because > > > shrink_dcache_for_umount() drops the sb->s_root dentry after > > > trimming the dentry tree. Hence the dcache drop would cleanup all > > > inode references, roots included. > > > > > > > Wouldn't that make things much cleaner, and remove at least *one* odd > > > > use of the nasty s_inodes list? > > > > > > Yes, it would, but someone who knows exactly when the fsnotify > > > marks can be removed needs to chime in here... > > So fsnotify needs a list of inodes for the superblock which have marks > attached and for which we hold inode reference. We can keep it inside > fsnotify code although it would practically mean another list_head for the > inode for this list (probably in our fsnotify_connector structure which > connects list of notification marks to the inode). If we actually get rid > of i_sb_list in struct inode, this will be a win for the overall system, > otherwise it is a net loss IMHO. So if we can figure out how to change > other s_inodes owners we can certainly do this fsnotify change. > > > > > And I wonder if the quota code (which uses the s_inodes list to enable > > > > quotas on already mounted filesystems) could for all the same reasons > > > > just walk the dentry tree instead (and remove_dquot_ref similarly > > > > could just remove it at dentry_unlink_inode() time)? > > > > > > I don't think that will work because we have to be able to modify > > > quota in evict() processing. This is especially true for unlinked > > > inodes being evicted from cache, but also the dquots need to stay > > > attached until writeback completes. > > > > > > Hence I don't think we can remove the quota refs from the inode > > > before we call iput_final(), and so I think quotaoff (at least) > > > still needs to iterate inodes... > > Yeah, I'm not sure how to get rid of the s_inodes use in quota code. One of > the things we need s_inodes list for is during quotaoff on a mounted > filesystem when we need to iterate all inodes which are referencing quota > structures and free them. In theory we could keep a list of inodes > referencing quota structures but that would require adding list_head to > inode structure for filesystems that support quotas. Now for the sake of > full context I'll also say that enabling / disabling quotas on a mounted > filesystem is a legacy feature because it is quite easy that quota > accounting goes wrong with it. So ext4 and f2fs support for quite a few > years a mode where quota tracking is enabled on mount and disabled on > unmount (if appropriate fs feature is enabled) and you can only enable / > disable enforcement of quota limits during runtime. So I could see us > deprecating this functionality altogether although jfs never adapted to > this new way we do quotas so we'd have to deal with that somehow. But one > way or another it would take a significant amount of time before we can > completely remove this so it is out of question for this series. I still maintain that we don't need to solve the fsnotify and lsm rework as part of this particular series.