Re: [PATCH 7/9] xfs: ignore stale buffers when scanning the buffer cache

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 11 Jul 2023 08:07:46 +1000

On Sun, Jul 09, 2023 at 04:32:16PM -0700, Darrick J. Wong wrote:
> On Mon, Jul 10, 2023 at 09:15:06AM +1000, Dave Chinner wrote:
> > On Wed, Jul 05, 2023 at 04:17:36PM -0700, Darrick J. Wong wrote:
> > > Concrete example:
> > > 
> > > So let's pretend that we have an xfs with fs blocksize 4k and dir
> > > blocksize 8k.  Let's suppose the directory's data fork maps fsblock 16,
> > > resulting in a buffer (daddr 128, length 16).  Let us further suppose
> > > the inobt then allocates fsblock 17, resulting in a buffer (daddr 136,
> > > length 8).  These are crosslinked.
> > 
> > RIght, that's an intersection of {128,16} and {136,8}. The search
> > region for buffer invalidation is {128, 8} {128, 16} and {136, 8}.
> > They are the only possible buffers that can be cached in that region
> > for a 4kB block size filesystem, as there can be no metadata buffers
> > starting at daddrs 129-135 or 137-143...
> 
> <nod> There's nothing in this flag that *prevents* someone from spending
> a lot of time pounding on the buffer cache with a per-daddr scan.  But
> note that the next patch only ever calls it with fsblock-aligned daddr
> and length values.  So in the end, the only xfs_buf queries will indded
> be for {128, 8} {128, 16} and {136, 8} like you said.
> 
> See the next patch for the actual usage.  Sorry that I've been unclear
> about all this and compounding it with not very good examples.  It's
> been a /very/ long time since I actually touched this code (this rewrite
> has been waiting for merge since ... 2019?) and I'm basically coming in
> cold. :(
> 
> Ultimately I make invalidation scan code look like this:
> 
> /* Buffer cache scan context. */
> struct xrep_bufscan {
> 	/* Disk address for the buffers we want to scan. */
> 	xfs_daddr_t		daddr;
> 
> 	/* Maximum number of sectors to scan. */
> 	xfs_daddr_t		max_sectors;
> 
> 	/* Each round, increment the search length by this number of sectors. */
> 	xfs_daddr_t		daddr_step;
> 
> 	/* Internal scan state; initialize to zero. */
> 	xfs_daddr_t		__sector_count;
> };
> 
> /*
>  * Return an incore buffer from a sector scan, or NULL if there are no buffers
>  * left to return.
>  */
> struct xfs_buf *
> xrep_bufscan_advance(
> 	struct xfs_mount	*mp,
> 	struct xrep_bufscan	*scan)
> {
> 	scan->__sector_count += scan->daddr_step;
> 	while (scan->__sector_count <= scan->max_sectors) {
> 		struct xfs_buf	*bp = NULL;
> 		int		error;
> 
> 		error = xfs_buf_incore(mp->m_ddev_targp, scan->daddr,
> 				scan->__sector_count, XBF_LIVESCAN, &bp);
> 		if (!error)
> 			return bp;
> 
> 		scan->__sector_count += scan->daddr_step;
> 	}
> 
> 	return NULL;
> }

.....

Yup, that's much better, will do for now.

I suspect this could be smarter with a custom rhashtable walker; all
the buffers at a given daddr should have the same key (i.e. daddr)
so should hash to the same list, so we should be able to walk an
entire list in one pass doing a bjoin/binval callback on each
non-stale buffer that matches the key....

But that can be done later, it's not necessary for correctness and
this looks a lot better than the original code.

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx