On Thu, May 25, 2023 at 05:45:03PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > When we're freeing extents that have been set in a bitmap, break the > bitmap extent into multiple sub-extents organized by fate, and reap the > extents. This enables us to dispose of old resources more efficiently > than doing them block by block. > > While we're at it, rename the reaping functions to make it clear that > they're reaping per-AG extents. > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- ..... > +xreap_agextent_binval( > + struct xreap_state *rs, > + xfs_agblock_t agbno, > + xfs_extlen_t *aglenp) > { > - struct xfs_buf *bp = NULL; > - int error; > + struct xfs_scrub *sc = rs->sc; > + struct xfs_perag *pag = sc->sa.pag; > + struct xfs_mount *mp = sc->mp; > + xfs_agnumber_t agno = sc->sa.pag->pag_agno; > + xfs_agblock_t agbno_next = agbno + *aglenp; > + xfs_agblock_t bno = agbno; > > /* > - * If there's an incore buffer for exactly this block, invalidate it. > * Avoid invalidating AG headers and post-EOFS blocks because we never > * own those. > */ > - if (!xfs_verify_fsbno(sc->mp, fsbno)) > + if (!xfs_verify_agbno(pag, agbno) || > + !xfs_verify_agbno(pag, agbno_next - 1)) > return; > > /* > - * We assume that the lack of any other known owners means that the > - * buffer can be locked without risk of deadlocking. > + * If there are incore buffers for these blocks, invalidate them. We > + * assume that the lack of any other known owners means that the buffer > + * can be locked without risk of deadlocking. The buffer cache cannot > + * detect aliasing, so employ nested loops to scan for incore buffers > + * of any plausible size. > */ > - error = xfs_buf_incore(sc->mp->m_ddev_targp, > - XFS_FSB_TO_DADDR(sc->mp, fsbno), > - XFS_FSB_TO_BB(sc->mp, 1), XBF_BCACHE_SCAN, &bp); > - if (error) > - return; > - > - xfs_trans_bjoin(sc->tp, bp); > - xfs_trans_binval(sc->tp, bp); > + while (bno < agbno_next) { > + xfs_agblock_t fsbcount; > + xfs_agblock_t max_fsbs; > + > + /* > + * Max buffer size is the max remote xattr buffer size, which > + * is one fs block larger than 64k. > + */ > + max_fsbs = min_t(xfs_agblock_t, agbno_next - bno, > + xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX)); > + > + for (fsbcount = 1; fsbcount < max_fsbs; fsbcount++) { > + struct xfs_buf *bp = NULL; > + xfs_daddr_t daddr; > + int error; > + > + daddr = XFS_AGB_TO_DADDR(mp, agno, bno); > + error = xfs_buf_incore(mp->m_ddev_targp, daddr, > + XFS_FSB_TO_BB(mp, fsbcount), > + XBF_BCACHE_SCAN, &bp); > + if (error) > + continue; > + > + xfs_trans_bjoin(sc->tp, bp); > + xfs_trans_binval(sc->tp, bp); > + rs->invalidated++; Hmmm. O(N^2) brute force lookup walk to find any buffer at that specific daddr? That's going to have an impact on running systems by hammering the perag hash lock. I didn't know this was being done before I suggested XBF_ANY_MATCH, but now I'm wondering how can we even have multiple buffers in the cache at a given address? The whole point of the ASSERT() in the match function is to indicate this should not ever happen. i.e. xfs_buf_find_insert() uses rhashtable_lookup_get_insert_fast(), which will return an existing buffer only if it has a match length. The compare function used at insert time will throw an assert fail if any other buffer exists at that address that has a mismatched length that is not stale. There is no way to avoid that ASSERT check on insert. Hence, AFAICT, the only way we can get multiple buffers into the cache at the same daddr with different lengths is for all the existing buffers at that daddr to all be stale at insert time. In which case, why do we need to invalidate buffers that are already stale (i.e. been invalidated)? What am I missing here? i.e. this appears to cater for behaviour that doesn't currently exist in the buffer cache, and I'm not sure we even want to allow to occur in the buffer cache given that it generally indicates in-memory metadata corruption..... Hmmmm. What if the buffer was already stale? The lookup will then clear all the stale state from it, leave it with no valid contents which we then invalidate again and log a new buffer cancel item for it. What are the implications of that? (My brain is too full of stuff trying to understand what this code is doing to be able to think this through right now.) -Dave. -- Dave Chinner david@xxxxxxxxxxxxx