On Tue, Mar 26, 2019 at 09:48:23AM +1100, Dave Chinner wrote: > And when it comes to VFS inode reclaim, XFS does not implement > ->evict_inode because there is nothing at the VFS level to do. > And ->destroy_inode ends up doing cleanup work (e.g. freeing on-disk > inodes) which is non-trivial, blocking work, but then still requires > the struct xfs_inode to be written back to disk before it can bei > freed. So it just gets marked "reclaimable" and background reclaim > then takes care of it from there so we avoid synchronous IO in inode > reclaim... > > This works because don't track dirty inode metadata in the VFS > writeback code (it's tracked with much more precision in the XFS log > infrastructure) and we don't write back inodes from the VFS > infrastructure, either. It's all done based on internal state > outside the VFS. > > And, because of this, the VFS cannot assume that it can free > the struct inode after calling ->destroy_inode or even use > call_rcu() to run a filesystem destructor because the filesystem > may need to do work that needs to block and that's not allowed in an > RCU callback... In Linus' patch that's what you get with non-NULL ->destroy_inode + NULL ->destroy_inode_rcu, so XFS won't be screwed by that. Said that, yes, XFS adds another fun twist there (AFAICS, it's the only in-tree filesystem that pulls that off). I would really like some comments from f2fs and ocfs2 folks, as well as Jan - he's had much more recent contact with writeback code than I have... Could somebody explain what's going on in f2fs and ocfs2 ->drop_inode()? It _should_ be just a predicate; looks like both are playing very odd games to work around writeback problems and I wonder if there's a cleaner solution for that. I can try and dig through maillist(s) archives, but I would really appreciate it if somebody could give a braindump on the issues dealt with in there...