On Wed, Aug 14, 2024 at 06:23:58AM +0200, Christoph Hellwig wrote: > Fix the newly added discard support to only offer a FITRIM range that > spans the RT device in addition to the main device if the RT device > actually supports discard. Without this we'll incorrectly accept > a larger range than actually supported and confuse user space if the > RT device does not support discard. This can easily happen when the > main device is a SSD but the RT device is a hard driver. > > Move the code around a bit to keep the max_blocks and granularity > assignments together and to handle the case where only the RT device > supports discard as well. > > Fixes: 3ba3ab1f6719 ("xfs: enable FITRIM on the realtime device") > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > --- > fs/xfs/xfs_discard.c | 37 ++++++++++++++++++++++++------------- > 1 file changed, 24 insertions(+), 13 deletions(-) > > diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c > index 6516afecce0979..46e28499a7d3ce 100644 > --- a/fs/xfs/xfs_discard.c > +++ b/fs/xfs/xfs_discard.c > @@ -654,9 +654,9 @@ xfs_ioc_trim( > struct xfs_mount *mp, > struct fstrim_range __user *urange) > { > - unsigned int granularity = > - bdev_discard_granularity(mp->m_ddev_targp->bt_bdev); > + struct block_device *bdev = mp->m_ddev_targp->bt_bdev; > struct block_device *rt_bdev = NULL; > + unsigned int granularity = 0; > struct fstrim_range range; > xfs_daddr_t start, end; > xfs_extlen_t minlen; > @@ -666,15 +666,6 @@ xfs_ioc_trim( > > if (!capable(CAP_SYS_ADMIN)) > return -EPERM; > - if (mp->m_rtdev_targp && > - bdev_max_discard_sectors(mp->m_rtdev_targp->bt_bdev)) > - rt_bdev = mp->m_rtdev_targp->bt_bdev; > - if (!bdev_max_discard_sectors(mp->m_ddev_targp->bt_bdev) && !rt_bdev) > - return -EOPNOTSUPP; Does this still return EOPNOTSUPP if there's no rt device and the data device doesn't support discard? I only see an EOPNOTSUPP return if there is a rt device and both devices don't support discard. > - > - if (rt_bdev) > - granularity = max(granularity, > - bdev_discard_granularity(rt_bdev)); > > /* > * We haven't recovered the log, so we cannot use our bnobt-guided > @@ -683,13 +674,33 @@ xfs_ioc_trim( > if (xfs_has_norecovery(mp)) > return -EROFS; > > + if (bdev_max_discard_sectors(bdev)) { > + max_blocks = mp->m_sb.sb_dblocks; > + granularity = bdev_discard_granularity(bdev); > + } else { > + bdev = NULL; > + } > + > + if (mp->m_rtdev_targp) { > + rt_bdev = mp->m_rtdev_targp->bt_bdev; > + if (!bdev_max_discard_sectors(rt_bdev)) { > + if (!bdev) > + return -EOPNOTSUPP; > + rt_bdev = NULL; > + } > + } > + if (rt_bdev) { > + max_blocks += mp->m_sb.sb_rblocks; I think this will break xfs_scrub, which (unlike fstrim) breaks up its FITRIM requests into smaller pieces. The (afwul) FITRIM interface says that [0, dblocks) trims the data device, and [dblocks, dblocks + rblocks) trims the realtime device. If the data device doesn't support discard, max_blocks will be rblocks, and that's what we use to validate the @start parameter. For example, if the data device is 10T spinning rust and the rt device is a 10G SSD, max_blocks will be 10G. A FITRIM request for just the rt device will be [10T, 10G), which now fails with EINVAL. I don't have a fix to suggest for this yet, but let me play around with this tomorrow and see if I can come up with something better, or figure out how I'm being thick. ;) My guess is that what we really want is if either device supports discard we allow the full range, but if a specific device doesn't support discard then we skip it and don't add anything to the outgoing range.len. But that's what I thought the current code does. <shrug> --D > + granularity = > + max(granularity, bdev_discard_granularity(rt_bdev)); > + } > + > if (copy_from_user(&range, urange, sizeof(range))) > return -EFAULT; > > range.minlen = max_t(u64, granularity, range.minlen); > minlen = XFS_B_TO_FSB(mp, range.minlen); > > - max_blocks = mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks; > if (range.start >= XFS_FSB_TO_B(mp, max_blocks) || > range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) || > range.len < mp->m_sb.sb_blocksize) > @@ -698,7 +709,7 @@ xfs_ioc_trim( > start = BTOBB(range.start); > end = start + BTOBBT(range.len) - 1; > > - if (bdev_max_discard_sectors(mp->m_ddev_targp->bt_bdev)) { > + if (bdev) { > error = xfs_trim_datadev_extents(mp, start, end, minlen, > &blocks_trimmed); > if (error) > -- > 2.43.0 > >