On 3/7/24 03:46, Christoph Hellwig wrote: > On Thu, Mar 07, 2024 at 07:51:35AM +1100, Dave Chinner wrote: >> On Wed, Mar 06, 2024 at 07:18:02AM -0800, Christoph Hellwig wrote: >>> Lookings at this a bit more I'm not sure my fix is enough as the error >>> handling is really complex. Also given that some discard callers are >>> from kernel threads messing with interruptibility I'm not entirely >>> sure that having this check in the common helper is a good idea. >> >> Yeah, this seems like a problem. The only places that userspace >> should be issuing discards directly and hence be interruptible from >> are FITRIM, BLKDISCARD and fallocate() on block devices. > > Yes. > >> Filesystems already handle fatal signals in FITRIM (e.g. see >> xfs_trim_should_stop(), ext4_trim_interrupted(), >> btrfs_trim_free_extents(), etc), so it seems to me that the only >> non-interruptible call from userspace are operations directly on >> block devices which have no higher level iteration over the range to >> discard and the user controls the range directly. > > Yeah. > >> Perhaps the solution is to change BLKDISCARD/fallocate() on bdev to >> look more like xfs_discard_extents() where it breaks the range up >> into smaller chunks and intersperses bio chaining with signal >> checks. > > Well, xfs_discard_extents has different extents from the higher > layers. __blkdev_issue_discard than breaks it up based on what > fits into the bio (and does some alignment against our normal > rule of leaving that to the splitting code). But I suspect moving > the loop in __blkdev_issue_discard into the callers could really > help with this. > >> >> I suspect the same solution is necessary for blkdev_issue_zeroout() >> and blkdev_issue_secure_erase(), because both of them have user >> controlled lengths... > > Yes. (or rather two sub cases of the former and the latter) > How about adding a new parameter named "interruptible" to __blkdev_issue_discard() and then using that parameter to deduce whether we need to intercept fatal signal or not? We can ensure that it's only when __blkdev_issue_discard() is invoked from the userspace (BLKDISCARD/fallocate()) we would have "interruptible" set to "1" otherwise for all other code path it could be set to "0". Yes we may need one helper which would help set the "interruptible" to "1" when invoked from BLKDISCARD/fallocate(). Probably the code should look as below (not tested): diff --git a/block/blk-lib.c b/block/blk-lib.c index dc8e35d0a51d..4b17f8b9dec1 100644 --- a/block/blk-lib.c +++ b/block/blk-lib.c @@ -56,7 +56,7 @@ static void await_bio_chain(struct bio *bio) } int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask, struct bio **biop) + sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, int interruptible) { struct bio *bio = *biop; sector_t bs_mask; @@ -97,8 +97,9 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, * is disabled. */ cond_resched(); - if (fatal_signal_pending(current)) { + if (interruptible && fatal_signal_pending(current)) { await_bio_chain(bio); + *biop = NULL; return -EINTR; } } @@ -126,7 +127,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, int ret; blk_start_plug(&plug); - ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, &bio); + ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, &bio, 0); if (!ret && bio) { ret = submit_bio_wait(bio); if (ret == -EOPNOTSUPP) @@ -139,6 +140,26 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector, } EXPORT_SYMBOL(blkdev_issue_discard); +int blkdev_issue_discard_interruptible(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask) +{ + struct bio *bio = NULL; + struct blk_plug plug; + int ret; + + blk_start_plug(&plug); + ret = __blkdev_issue_discard(bdev, sector, nr_sects, gfp_mask, &bio, 1); + if (!ret && bio) { + ret = submit_bio_wait(bio); + if (ret == -EOPNOTSUPP) + ret = 0; + bio_put(bio); + } + blk_finish_plug(&plug); + + return ret; +} + static int __blkdev_issue_write_zeroes(struct block_device *bdev, sector_t sector, sector_t nr_sects, gfp_t gfp_mask, struct bio **biop, unsigned flags) diff --git a/block/fops.c b/block/fops.c index 0cf8cf72cdfa..f9399a59cf4e 100644 --- a/block/fops.c +++ b/block/fops.c @@ -828,7 +828,7 @@ static long blkdev_fallocate(struct file *file, int mode, loff_t start, if (error) goto fail; - error = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, + error = blkdev_issue_discard_interruptible(bdev, start >> SECTOR_SHIFT, len >> SECTOR_SHIFT, GFP_KERNEL); break; default: diff --git a/block/ioctl.c b/block/ioctl.c index 438f79c564cf..e869f2859eb1 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -117,7 +117,8 @@ static int blk_ioctl_discard(struct block_device *bdev, blk_mode_t mode, err = truncate_bdev_range(bdev, mode, start, start + len - 1); if (err) goto fail; - err = blkdev_issue_discard(bdev, start >> 9, len >> 9, GFP_KERNEL); + err = blkdev_issue_discard_interruptible(bdev, start >> 9, + len >> 9, GFP_KERNEL); fail: filemap_invalidate_unlock(inode->i_mapping); return err; And yes we would need similar changes for blkdev_issue_zeroout() and blkdev_issue_secure_erase(). Thanks, --Nilay