Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Wed, 10 Apr 2024 18:15:02 -0700

On Wed, Apr 10, 2024 at 08:12:16AM -0700, Christoph Hellwig wrote:
> > +	/*
> > +	 * If the caller wants us to do a scrub-by-handle and the file used to
> > +	 * call the ioctl is not the same file, load the incore inode and pin
> > +	 * it across all the scrubv actions to avoid repeated UNTRUSTED
> > +	 * lookups.  The reference is not passed to deeper layers of scrub
> > +	 * because each scrubber gets to decide its own strategy for getting an
> > +	 * inode.
> > +	 */
> > +	if (vhead->svh_ino && vhead->svh_ino != ip_in->i_ino)
> > +		handle_ip = xchk_scrubv_open_by_handle(mp, vhead);
> 
> Oh.  So we read the inode, keep a reference to it, but still hit the
> inode cache every time.  A little non-onvious and not perfect for
> performance, but based on your numbers probably good enough.
> 
> Curious: what is the reason the scrubbers want/need different ways to
> get at the inode?

I don't remember the exact reason why we don't pass this ip into
xfs_scrub_metadata, but iirc the inode scrub setup functions react
differently (from the bmap/dir/attr/symlink scrubbers) when iget
failures occur.

Also this way xfs_scrub_metadata owns the refcount to whatever inode it
picks up, and can do whatever it wants with that reference.

> > +	/*
> > +	 * If we're holding the only reference to an inode opened via handle
> > +	 * and the scan was clean, mark it dontcache so that we don't pollute
> > +	 * the cache.
> > +	 */
> > +	if (handle_ip) {
> > +		if (set_dontcache &&
> > +		    atomic_read(&VFS_I(handle_ip)->i_count) == 1)
> > +			d_mark_dontcache(VFS_I(handle_ip));
> > +		xfs_irele(handle_ip);
> > +	}
> 
> This looks a little weird to me.  Can't we simply use XFS_IGET_DONTCACHE
> at iget time and then clear I_DONTCACHE here if we want to keep the
> inode around?

Not anymore, because other threads can mess around with the dontcache
state (yay fsdax access path changes!!) while we are scrubbing the
inode.

>                Given that we only set the uncached flag from
> XFS_IGET_DONTCACHE on a cache miss, we won't have set
> DCACHE_DONTCACHE anywhere (and don't really care about the dentries to
> start with).
> 
> But why do we care about keeping the inodes with errors in memory
> here, but not elsewhere?

We actually, do, but it's not obvious...

> Maybe this can be explained in an expanded comment.

...because this bit here is basically the same as xchk_irele, but we
don't have a xfs_scrub object to pass in, so it's opencoded.  I could
pull this logic out into:

void xfs_scrub_irele(struct xfs_inode *ip)
{
	if (atomic_read(&VFS_I(ip)->i_count) == 1) {
		/*
		 * If this is the last reference to the inode and the caller
		 * permits it, set DONTCACHE to avoid thrashing.
		 */
		d_mark_dontcache(VFS_I(ip));
	}
	xfs_irele(ip);
}

and change xchk_irele to:

void
xchk_irele(
	struct xfs_scrub	*sc,
	struct xfs_inode	*ip)
{
	if (sc->tp) {
		spin_lock(&VFS_I(ip)->i_lock);
		VFS_I(ip)->i_state &= ~I_DONTCACHE;
		spin_unlock(&VFS_I(ip)->i_lock);
		xfs_irele(ip);
		return;
	}

	xfs_scrub_irele(ip);
}