[PATCHSET 1/5] xfs_scrub: fixes and cleanups for inode iteration

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Thu, 06 Feb 2025 14:29:36 -0800

Hi all,

Christoph and I were investigating some performance problems in
xfs_scrub on filesystems that have a lot of rtgroups, and we noticed
several problems and inefficiencies in the existing inode iteration
code.

The first observation is that two of the three callers of
scrub_all_inodes (phases 5 and 6) just want to walk all the user files
in the filesystem.  They don't care about metadir directories, and they
don't care about matching inumbers data to bulkstat data for the purpose
of finding broken files.  The third caller (phase 3) does, so it makes
more sense to create a much simpler iterator for phase 5 and 6 that only
calls bulkstat.

But then I started noticing other problems in the phase 3 inode
iteration code -- if the per-inumbers bulkstat iterator races with other
threads that are creating or deleting files we can walk off the end of
the bulkstat array, we can miss newly allocated files, miss old
allocated inodes if there are newly allocated ones, pointlessly try to
scan deleted files, and redundantly scan files from another inobt
record.

These races rarely happen, but they all need fixing.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

With a bit of luck, this should all go splendidly.
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-inode-iteration-fixes
---
Commits in this patchset:
 * libxfs: unmap xmbuf pages to avoid disaster
 * libxfs: mark xmbuf_{un,}map_page static
 * man: document new XFS_BULK_IREQ_METADIR flag to bulkstat
 * libfrog: wrap handle construction code
 * xfs_scrub: don't report data loss in unlinked inodes twice
 * xfs_scrub: call bulkstat directly if we're only scanning user files
 * xfs_scrub: remove flags argument from scrub_scan_all_inodes
 * xfs_scrub: selectively re-run bulkstat after re-running inumbers
 * xfs_scrub: actually iterate all the bulkstat records
 * xfs_scrub: don't double-scan inodes during phase 3
 * xfs_scrub: don't (re)set the bulkstat request icount incorrectly
 * xfs_scrub: don't complain if bulkstat fails
 * xfs_scrub: return early from bulkstat_for_inumbers if no bulkstat data
 * xfs_scrub: don't blow away new inodes in bulkstat_single_step
 * xfs_scrub: hoist the phase3 bulkstat single stepping code
 * xfs_scrub: ignore freed inodes when single-stepping during phase 3
 * xfs_scrub: try harder to fill the bulkstat array with bulkstat()
---
 include/cache.h               |    6 
 io/parent.c                   |    9 -
 libfrog/Makefile              |    1 
 libfrog/bitmask.h             |    6 
 libfrog/handle_priv.h         |   55 ++++
 libxfs/buf_mem.c              |  159 +++++++++---
 libxfs/buf_mem.h              |    3 
 libxfs/cache.c                |   11 +
 man/man2/ioctl_xfs_bulkstat.2 |    8 +
 scrub/common.c                |    9 -
 scrub/inodes.c                |  552 ++++++++++++++++++++++++++++++++++++-----
 scrub/inodes.h                |   12 +
 scrub/phase3.c                |    7 -
 scrub/phase5.c                |   14 -
 scrub/phase6.c                |   14 +
 spaceman/health.c             |    9 -
 16 files changed, 726 insertions(+), 149 deletions(-)
 create mode 100644 libfrog/handle_priv.h