On Mon, Oct 07, 2024 at 05:28:57PM -0700, Linus Torvalds wrote: > On Mon, 7 Oct 2024 at 16:33, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > There may be other inode references being held that make > > the inode live longer than the dentry cache. When should the > > fsnotify marks be removed from the inode in that case? Do they need > > to remain until, e.g, writeback completes? > > Note that my idea is to just remove the fsnotify marks when the dentry > discards the inode. > > That means that yes, the inode may still have a lifetime after the > dentry (because of other references, _or_ just because I_DONTCACHE > isn't set and we keep caching the inode). > > BUT - fsnotify won't care. There won't be any fsnotify marks on that > inode any more, and without a dentry that points to it, there's no way > to add such marks. > > (A new dentry may be re-attached to such an inode, and then fsnotify > could re-add new marks, but that doesn't change anything - the next > time the dentry is detached, the marks would go away again). > > And yes, this changes the timing on when fsnotify events happen, but > what I'm actually hoping for is that Jan will agree that it doesn't > actually matter semantically. > > > > Then at umount time, the dentry shrinking will deal with all live > > > dentries, and at most the fsnotify layer would send the FS_UNMOUNT to > > > just the root dentry inodes? > > > > I don't think even that is necessary, because > > shrink_dcache_for_umount() drops the sb->s_root dentry after > > trimming the dentry tree. Hence the dcache drop would cleanup all > > inode references, roots included. > > Ahh - even better. > > I didn't actually look very closely at the actual umount path, I was > looking just at the fsnotify_inoderemove() place in > dentry_unlink_inode() and went "couldn't we do _this_ instead?" > > > > Wouldn't that make things much cleaner, and remove at least *one* odd > > > use of the nasty s_inodes list? > > > > Yes, it would, but someone who knows exactly when the fsnotify > > marks can be removed needs to chime in here... > > Yup. Honza? > > (Aside: I don't actually know if you prefer Jan or Honza, so I use > both randomly and interchangeably?) > > > > I have this feeling that maybe we can just remove the other users too > > > using similar models. I think the LSM layer use (in landlock) is bogus > > > for exactly the same reason - there's really no reason to keep things > > > around for a random cached inode without a dentry. > > > > Perhaps, but I'm not sure what the landlock code is actually trying > > to do. In Landlock, inodes (see landlock_object) may be referenced by several rulesets, either tied to a task's cred or a ruleset's file descriptor. A ruleset may outlive its referenced inodes, and this should not block related umounts. security_sb_delete() is used to gracefully release such references. > > Yeah, I wouldn't be surprised if it's just confused - it's very odd. > > But I'd be perfectly happy just removing one use at a time - even if > we keep the s_inodes list around because of other users, it would > still be "one less thing". > > > Hence, to me, the lifecycle and reference counting of inode related > > objects in landlock doesn't seem quite right, and the use of the > > security_sb_delete() callout appears to be papering over an internal > > lifecycle issue. > > > > I'd love to get rid of it altogether. I'm not sure to fully understand the implications for now, but it would definitely be good to simplify this lifetime management. The only requirement for Landlock is that inodes references should live as long as the related inodes are accessible by user space or already in use. The sooner these references are removed from related ruleset, the better. > > Yeah, I think the inode lifetime is just so random these days that > anything that depends on it is questionable. > > The quota case is probably the only thing where the inode lifetime > *really* makes sense, and that's the one where I looked at the code > and went "I *hope* this can be converted to traversing the dentry > tree", but at the same time it did look sensible to make it be about > inodes. > > If we can convert the quota side to be based on dentry lifetimes, it > will almost certainly then have to react to the places that do > "d_add()" when re-connecting an inode to a dentry at lookup time. > > So yeah, the quota code looks worse, but even if we could just remove > fsnotify and landlock, I'd still be much happier. > > Linus