On Tue, Mar 04, 2025 at 10:02:57AM -0500, Josef Bacik wrote: > On Tue, Mar 04, 2025 at 07:04:49PM +1100, Dave Chinner wrote: > > On Mon, Mar 03, 2025 at 12:00:29PM -0500, Josef Bacik wrote: > > > CONCLUSION > > > > > > I'd love some feedback on my potential problems and solutions, as well as any > > > other problems people may see. If we can get some discussion beforehand I can > > > finish up these patches and get some testing in before LSFMMBPF and we can have > > > a proper in-person discussion about the realities of the patchset. Thanks, > > > > I think there's several overlapping issues here: > > > > 1. everything that stores state on/in the VFS inode needs to take a > > reference to the inode. > > 2. Hacks like "walk the superblock inode list to visit every cached > > inode without holding references to them" need to go away > > 3. inode eviction needs to be reworked to allow decoupled processing > > of the inode once all VFS references go away. > > 4. The inode LRU (i.e. unreferenced inode caching) needs to go away > > > > Anything else I missed? :) > > Nope that's all what I've got on my list. I want to run my plan for decoupling > the inode eviction by you before I send my patches out to see if you have a > better idea. I know XFS has all this delaying stuff, but a lot of file systems > don't have that infrastructure, and I don't want to go around adding it to > everybody, so I still want to have a VFS hook way to do the final truncate. The > question is where to put it, and will it at the very least not mess you guys up, > or in the best case be useful. What I've been looking at locally if for the iput_final() processing to be done asynchronously rather than in iput() context. i.e. once the refcount goes to zero, we do all the cleanup stuff we can do without blocking, then punt it to an async context to finish the cleanup. The problems I've had with this are entirely from the "inodes with refcount of zero are still visible and accessed by the VFS code", and I hadn't got to solving that problem. The sb list iter work I posted a while back (and need to finish off) was the first step I'd made in that direction. > We're agreed on the active/passive refcount, so the active refcount will be the > arbiter of when we can do the final truncate. My current patchset adds a new > ->final_unlink() method to the inode_operations, and if it's set we call it when > the active refcount goes to 0. Obviously a passive refcount is still held on > the inode while this operation is occurring. Yes, I think this allows async processing of the inode after the active refcount reaches zero and iput_final() is called. The async processing retains a passive reference all the way through until the inode is completely destroyed and then freed. > I just want to spell it out, because some of what you've indicated is you want > the file system to control when the final unlink happens, even outside of the > VFS. What I really want is for all iput_final/evict handling that may need to block to be punted to an async processing context. That's what we already do with XFS, and what I'd really like for the VFS to natively support. i.e. passive reference counts allow decoupling of inode eviction work from the task context that drops the last active reference. We need to do this to allow the nasty mm LRU hacks to be turned into active reference counts because those hacks require iput() to be safe to call from any context... > I'm not sure if you intend to say that it happens without a struct inode, > or you just mean that we'll delay it for whenever the file system wants to do > it, and we'll be holding the struct inode for that entire time. Yes, XFS holds the struct inode via the struct xfs_inode wrapped around the outside of it. i.e. the VFS inode lifecycle is a subset of the XFS inode lifecycle. i.e. I would expect that a filesystem with such a VFs vs FS lifecycle setup to maintain a passive struct inode reference count for it's own inode wrapper structure. This means the object will remain valid for as long as the filesystem needs it's inode wrapper structure to live. The filesystem would add the passive reference at inode allocation. This passive reference then gets removed when the filesystem is done tearing down it's internal inode structure and at that point the whole object can be freed. Also, we already do a -lot- of work behind the scenes without inode references in the internal XFS code (e.g. inode writeback does lockless, cache coherent unreferenced lookups of inodes). Converting these internal lookups to use passive references would greatly simplify the internal lookup vs reclaim interactions that we currently have to handle.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx