On Mon, Apr 15, 2024 at 10:35:26PM -0700, Christoph Hellwig wrote: > > out_free: > > + /* > > + * If we're holding the only reference to an inode opened via handle, > > + * mark it dontcache so that we don't pollute the cache. > > + */ > > + if (handle_ip) { > > + if (atomic_read(&VFS_I(handle_ip)->i_count) == 1) > > + d_mark_dontcache(VFS_I(handle_ip)); > > + xfs_irele(handle_ip); > > This still feels the wrong way around vs just using IGET_UNCACHED? > > Or did I miss the killer argument why that can't work? No, I missed the killer argument. In fact, I think the fsdax reorganization into d_mark_dontcache makes this counterproductive. Initially, I had thought that we'd be doing users a favor by only marking inodes dontcache at the end of a scrub operation, and only if there's only one reference to that inode. This was more or less true back when I_DONTCACHE was an XFS iflag and the only thing it did was change the outcome of xfs_fs_drop_inode to 1. If there are dentries pointing to the inode when scrub finishes, the inode will have positive i_count and stay around in cache until dentry reclaim. But now we have d_mark_dontcache, which cause the inode *and* the dentries attached to it all to be marked I_DONTCACHE, which means that we drop the dentries ASAP, which drops the inode ASAP. This is bad if scrub found problems with the inode, because now they can be scheduled for inactivation, which can cause inodegc to trip on it and shut down the filesystem. Even if the inode isn't bad, this is still suboptimal because phases 3-7 each initiate inode scans. Dropping the inode immediately during phase 3 is silly because phase 5 will reload it and drop it immediately, etc. It's fine to mark the inodes dontcache, but if there have been accesses to the file that set up dentries, we should keep them. I validated this by setting up ftrace to capture xfs_iget_recycle* tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per 30 seconds. So I think I want to change xchk_irele to clear I_DONTCACHE if we're in transaction context or if corruption was found; and to use XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls. --D