On Tue 08-10-24 10:57:22, Amir Goldstein wrote: > On Tue, Oct 8, 2024 at 1:33 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Mon, Oct 07, 2024 at 01:37:19PM -0700, Linus Torvalds wrote: > > > On Thu, 3 Oct 2024 at 04:57, Jan Kara <jack@xxxxxxx> wrote: > > > > > > > > Fair enough. If we go with the iterator variant I've suggested to Dave in > > > > [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and > > > > Landlocks hook_sb_delete() into a single iteration relatively easily. But > > > > I'd wait with that convertion until this series lands. > > > > > > Honza, I looked at this a bit more, particularly with an eye of "what > > > happens if we just end up making the inode lifetimes subject to the > > > dentry lifetimes" as suggested by Dave elsewhere. > > > > .... > > > > > which makes the fsnotify_inode_delete() happen when the inode is > > > removed from the dentry. > > > > There may be other inode references being held that make > > the inode live longer than the dentry cache. When should the > > fsnotify marks be removed from the inode in that case? Do they need > > to remain until, e.g, writeback completes? > > > > fsnotify inode marks remain until explicitly removed or until sb > is unmounted (*), so other inode references are irrelevant to > inode mark removal. > > (*) fanotify has "evictable" inode marks, which do not hold inode > reference and go away on inode evict, but those mark evictions > do not generate any event (i.e. there is no FAN_UNMOUNT). Yes. Amir beat me with the response so let me just add that FS_UMOUNT event is for inotify which guarantees that either you get an event about somebody unlinking the inode (e.g. IN_DELETE_SELF) or event about filesystem being unmounted (IN_UMOUNT) if you place mark on some inode. I also don't see how we would maintain this behavior with what Linus proposes. > > > Then at umount time, the dentry shrinking will deal with all live > > > dentries, and at most the fsnotify layer would send the FS_UNMOUNT to > > > just the root dentry inodes? > > > > I don't think even that is necessary, because > > shrink_dcache_for_umount() drops the sb->s_root dentry after > > trimming the dentry tree. Hence the dcache drop would cleanup all > > inode references, roots included. > > > > > Wouldn't that make things much cleaner, and remove at least *one* odd > > > use of the nasty s_inodes list? > > > > Yes, it would, but someone who knows exactly when the fsnotify > > marks can be removed needs to chime in here... So fsnotify needs a list of inodes for the superblock which have marks attached and for which we hold inode reference. We can keep it inside fsnotify code although it would practically mean another list_head for the inode for this list (probably in our fsnotify_connector structure which connects list of notification marks to the inode). If we actually get rid of i_sb_list in struct inode, this will be a win for the overall system, otherwise it is a net loss IMHO. So if we can figure out how to change other s_inodes owners we can certainly do this fsnotify change. > > > And I wonder if the quota code (which uses the s_inodes list to enable > > > quotas on already mounted filesystems) could for all the same reasons > > > just walk the dentry tree instead (and remove_dquot_ref similarly > > > could just remove it at dentry_unlink_inode() time)? > > > > I don't think that will work because we have to be able to modify > > quota in evict() processing. This is especially true for unlinked > > inodes being evicted from cache, but also the dquots need to stay > > attached until writeback completes. > > > > Hence I don't think we can remove the quota refs from the inode > > before we call iput_final(), and so I think quotaoff (at least) > > still needs to iterate inodes... Yeah, I'm not sure how to get rid of the s_inodes use in quota code. One of the things we need s_inodes list for is during quotaoff on a mounted filesystem when we need to iterate all inodes which are referencing quota structures and free them. In theory we could keep a list of inodes referencing quota structures but that would require adding list_head to inode structure for filesystems that support quotas. Now for the sake of full context I'll also say that enabling / disabling quotas on a mounted filesystem is a legacy feature because it is quite easy that quota accounting goes wrong with it. So ext4 and f2fs support for quite a few years a mode where quota tracking is enabled on mount and disabled on unmount (if appropriate fs feature is enabled) and you can only enable / disable enforcement of quota limits during runtime. So I could see us deprecating this functionality altogether although jfs never adapted to this new way we do quotas so we'd have to deal with that somehow. But one way or another it would take a significant amount of time before we can completely remove this so it is out of question for this series. I see one problem with the idea "whoever has a need to iterate inodes needs to keep track of inodes it needs to iterate through". It is fine conceptually but with s_inodes list we pay the cost only once and multiple users benefit. With each subsystem tracking inodes we pay the cost for each user (both in terms of memory and CPU). So if you don't use any of the subsystems that need iteration, you win, but if you use two or more of these subsystems, in particular those which need to track significant portion of all inodes, you are losing. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR