On Wed, Oct 09, 2024 at 11:21:10AM +1100, Dave Chinner wrote: > On Tue, Oct 08, 2024 at 02:59:07PM +0200, Mickaël Salaün wrote: > > On Mon, Oct 07, 2024 at 05:28:57PM -0700, Linus Torvalds wrote: > > > On Mon, 7 Oct 2024 at 16:33, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > > > > > There may be other inode references being held that make > > > > the inode live longer than the dentry cache. When should the > > > > fsnotify marks be removed from the inode in that case? Do they need > > > > to remain until, e.g, writeback completes? > > > > > > Note that my idea is to just remove the fsnotify marks when the dentry > > > discards the inode. > > > > > > That means that yes, the inode may still have a lifetime after the > > > dentry (because of other references, _or_ just because I_DONTCACHE > > > isn't set and we keep caching the inode). > > > > > > BUT - fsnotify won't care. There won't be any fsnotify marks on that > > > inode any more, and without a dentry that points to it, there's no way > > > to add such marks. > > > > > > (A new dentry may be re-attached to such an inode, and then fsnotify > > > could re-add new marks, but that doesn't change anything - the next > > > time the dentry is detached, the marks would go away again). > > > > > > And yes, this changes the timing on when fsnotify events happen, but > > > what I'm actually hoping for is that Jan will agree that it doesn't > > > actually matter semantically. > > > > > > > > Then at umount time, the dentry shrinking will deal with all live > > > > > dentries, and at most the fsnotify layer would send the FS_UNMOUNT to > > > > > just the root dentry inodes? > > > > > > > > I don't think even that is necessary, because > > > > shrink_dcache_for_umount() drops the sb->s_root dentry after > > > > trimming the dentry tree. Hence the dcache drop would cleanup all > > > > inode references, roots included. > > > > > > Ahh - even better. > > > > > > I didn't actually look very closely at the actual umount path, I was > > > looking just at the fsnotify_inoderemove() place in > > > dentry_unlink_inode() and went "couldn't we do _this_ instead?" > > > > > > > > Wouldn't that make things much cleaner, and remove at least *one* odd > > > > > use of the nasty s_inodes list? > > > > > > > > Yes, it would, but someone who knows exactly when the fsnotify > > > > marks can be removed needs to chime in here... > > > > > > Yup. Honza? > > > > > > (Aside: I don't actually know if you prefer Jan or Honza, so I use > > > both randomly and interchangeably?) > > > > > > > > I have this feeling that maybe we can just remove the other users too > > > > > using similar models. I think the LSM layer use (in landlock) is bogus > > > > > for exactly the same reason - there's really no reason to keep things > > > > > around for a random cached inode without a dentry. > > > > > > > > Perhaps, but I'm not sure what the landlock code is actually trying > > > > to do. > > > > In Landlock, inodes (see landlock_object) may be referenced by several > > rulesets, either tied to a task's cred or a ruleset's file descriptor. > > A ruleset may outlive its referenced inodes, and this should not block > > related umounts. security_sb_delete() is used to gracefully release > > such references. > > Ah, there's the problem. The ruleset is persistent, not the inode. > Like fsnotify, the life cycle and reference counting is upside down. > The inode should cache the ruleset rather than the ruleset pinning > the inode. A ruleset needs to takes a reference to the inode as for an opened file and keep it "alive" as long as it may be re-used by user space (i.e. as long as the superblock exists). One of the goal of a ruleset is to identify inodes as long as they are accessible. When a sandboxed process request to open a file, its sandbox's ruleset checks against the referenced inodes (in a nutshell). In practice, rulesets reference a set of struct landlock_object which references an inode or not (if it vanished). There is only one landlock_object referenced per inode. This makes it possible to have a dynamic N:M mapping between rulesets and inodes which enables a ruleset to be deleted before its referenced inodes, or the other way around. > > See my reply to Jan about fsnotify. > > > > Yeah, I wouldn't be surprised if it's just confused - it's very odd. > > > > > > But I'd be perfectly happy just removing one use at a time - even if > > > we keep the s_inodes list around because of other users, it would > > > still be "one less thing". > > > > > > > Hence, to me, the lifecycle and reference counting of inode related > > > > objects in landlock doesn't seem quite right, and the use of the > > > > security_sb_delete() callout appears to be papering over an internal > > > > lifecycle issue. > > > > > > > > I'd love to get rid of it altogether. > > > > I'm not sure to fully understand the implications for now, but it would > > definitely be good to simplify this lifetime management. The only > > requirement for Landlock is that inodes references should live as long > > as the related inodes are accessible by user space or already in use. > > The sooner these references are removed from related ruleset, the > > better. > > I'm missing something. Inodes are accessible to users even when > they are not in cache - we just read them from disk and instantiate > a new VFS inode. > > So how do you attach the correct ruleset to a newly instantiated > inode? We can see a Landlock ruleset as a set of weakly opened files/inodes. A Landolck ruleset call iget() to keep the related VFS inodes alive, which means that when user space opens a file pointing to the same inode, the same VFS inode will be re-used and then we can match it against a ruleset. > > i.e. If you can find the ruleset for any given inode that is brought > into cache (e.g. opening an existing, uncached file), then why do > you need to take inode references so they are never evicted? A landlock_object only keep a reference to an inode, not to the rulesets pointing to it: * inode -> 1 landlock_object or NULL * landlock_object -> 1 inode or NULL * ruleset -> N landlock_object There are mainly two different operations: 1. Match 1 inode against a set of N inode references (i.e. a ruleset). 2. Drop the references of N rulesets (in practice 1 intermediate landlock_object) pointing to 1 inode. > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >