On Tue, Apr 02, 2024 at 10:07:18PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > On a 10TB filesystem where the free space in each AG is heavily > fragmented, I noticed some very high runtimes on a FITRIM call for the > entire filesystem. xfs_scrub likes to report progress information on > each phase of the scrub, which means that a strace for the entire > filesystem: > > ioctl(3, FITRIM, {start=0x0, len=10995116277760, minlen=0}) = 0 <686.209839> > > shows that scrub is uncommunicative for the entire duration. Reducing > the size of the FITRIM requests to a single AG at a time produces lower > times for each individual call, but even this isn't quite acceptable, > because the time between progress reports are still very high: > > Strace for the first 4x 1TB AGs looks like (2): > ioctl(3, FITRIM, {start=0x0, len=1099511627776, minlen=0}) = 0 <68.352033> > ioctl(3, FITRIM, {start=0x10000000000, len=1099511627776, minlen=0}) = 0 <68.760323> > ioctl(3, FITRIM, {start=0x20000000000, len=1099511627776, minlen=0}) = 0 <67.235226> > ioctl(3, FITRIM, {start=0x30000000000, len=1099511627776, minlen=0}) = 0 <69.465744> > > I then had the idea to limit the length parameter of each call to a > smallish amount (~11GB) so that we could report progress relatively > quickly, but much to my surprise, each FITRIM call still took ~68 > seconds! > > Unfortunately, the by-length fstrim implementation handles this poorly > because it walks the entire free space by length index (cntbt), which is > a very inefficient way to walk a subset of the blocks of an AG. > > Therefore, create a second implementation that will walk the bnobt and > perform the trims in block number order. This implementation avoids the > worst problems of the original code, though it lacks the desirable > attribute of freeing the biggest chunks first. > > On the other hand, this second implementation will be much easier to > constrain the system call latency, and makes it much easier to report > fstrim progress to anyone who's running xfs_scrub. > > Inspired-by: Dave Chinner <dchinner@xxxxxxxxxx> > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > --- > fs/xfs/xfs_discard.c | 153 ++++++++++++++++++++++++++++++-------------------- > 1 file changed, 93 insertions(+), 60 deletions(-) Looks OK. Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> -- Dave Chinner david@xxxxxxxxxxxxx