On Tue, Apr 16, 2024 at 03:31:48PM -0700, Darrick J. Wong wrote: > On Mon, Apr 15, 2024 at 10:35:26PM -0700, Christoph Hellwig wrote: > > > out_free: > > > + /* > > > + * If we're holding the only reference to an inode opened via handle, > > > + * mark it dontcache so that we don't pollute the cache. > > > + */ > > > + if (handle_ip) { > > > + if (atomic_read(&VFS_I(handle_ip)->i_count) == 1) > > > + d_mark_dontcache(VFS_I(handle_ip)); > > > + xfs_irele(handle_ip); > > > > This still feels the wrong way around vs just using IGET_UNCACHED? > > > > Or did I miss the killer argument why that can't work? > > No, I missed the killer argument. In fact, I think the fsdax > reorganization into d_mark_dontcache makes this counterproductive. > > Initially, I had thought that we'd be doing users a favor by only > marking inodes dontcache at the end of a scrub operation, and only if > there's only one reference to that inode. This was more or less true > back when I_DONTCACHE was an XFS iflag and the only thing it did was > change the outcome of xfs_fs_drop_inode to 1. If there are dentries > pointing to the inode when scrub finishes, the inode will have positive > i_count and stay around in cache until dentry reclaim. > > But now we have d_mark_dontcache, which cause the inode *and* the > dentries attached to it all to be marked I_DONTCACHE, which means that > we drop the dentries ASAP, which drops the inode ASAP. > > This is bad if scrub found problems with the inode, because now they can > be scheduled for inactivation, which can cause inodegc to trip on it and > shut down the filesystem. > > Even if the inode isn't bad, this is still suboptimal because phases 3-7 > each initiate inode scans. Dropping the inode immediately during phase > 3 is silly because phase 5 will reload it and drop it immediately, etc. > It's fine to mark the inodes dontcache, but if there have been accesses > to the file that set up dentries, we should keep them. > > I validated this by setting up ftrace to capture xfs_iget_recycle* > tracepoints and ran xfs/285 for 30 seconds. With current djwong-wtf I > saw ~30,000 recycle events. I then dropped the d_mark_dontcache calls > and set XFS_IGET_DONTCACHE, and the recycle events dropped to ~5,000 per > 30 seconds. > > So I think I want to change xchk_irele to clear I_DONTCACHE if we're in > transaction context or if corruption was found; and to use > XFS_IGET_DONTCACHE instead of the i_count-based d_mark_dontcache calls. Actually we don't even need to clear DONTCACHE on sick inodes; commit 7975e465af6b4 ("xfs: drop IDONTCACHE on inodes when we mark them sick") already took care of that. --D > --D >