On 2/1/23 00:58, Mike Snitzer wrote: > Subject: > Re: [PATCH v2 02/21] block, blkfilter: Block Device Filtering Mechanism > From: > Mike Snitzer <snitzer@xxxxxxxxxx> > Date: > 2/1/23, 00:58 > > To: > Sergei Shtepa <sergei.shtepa@xxxxxxxxx> > CC: > axboe@xxxxxxxxx, corbet@xxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-doc@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, dm-devel@xxxxxxxxxx > > > On Fri, Dec 09 2022 at 9:23P -0500, > Sergei Shtepa <sergei.shtepa@xxxxxxxxx> wrote: > >> Allows to attach block device filters to the block devices. Kernel >> modules can use this functionality to extend the capabilities of the >> block layer. >> >> Signed-off-by: Sergei Shtepa <sergei.shtepa@xxxxxxxxx> >> --- >> block/bdev.c | 70 ++++++++++++++++++++++++++++++++++++++ >> block/blk-core.c | 19 +++++++++-- >> include/linux/blk_types.h | 2 ++ >> include/linux/blkdev.h | 71 +++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 160 insertions(+), 2 deletions(-) >> >> diff --git a/block/bdev.c b/block/bdev.c >> index d699ecdb3260..b820178824b2 100644 >> --- a/block/bdev.c >> +++ b/block/bdev.c >> @@ -427,6 +427,7 @@ static void init_once(void *data) >> >> static void bdev_evict_inode(struct inode *inode) >> { >> + bdev_filter_detach(I_BDEV(inode)); >> truncate_inode_pages_final(&inode->i_data); >> invalidate_inode_buffers(inode); /* is it needed here? */ >> clear_inode(inode); >> @@ -502,6 +503,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno) >> return NULL; >> } >> bdev->bd_disk = disk; >> + bdev->bd_filter = NULL; >> return bdev; >> } >> >> @@ -1092,3 +1094,71 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat) >> >> blkdev_put_no_open(bdev); >> } >> + >> +/** >> + * bdev_filter_attach - Attach the filter to the original block device. >> + * @bdev: >> + * Block device. >> + * @flt: >> + * Filter that needs to be attached to the block device. >> + * >> + * Before adding a filter, it is necessary to initialize &struct bdev_filter >> + * using a bdev_filter_init() function. >> + * >> + * The bdev_filter_detach() function allows to detach the filter from the block >> + * device. >> + * >> + * Return: 0 if succeeded, or -EALREADY if the filter already exists. >> + */ >> +int bdev_filter_attach(struct block_device *bdev, >> + struct bdev_filter *flt) >> +{ >> + int ret = 0; >> + >> + blk_mq_freeze_queue(bdev->bd_queue); >> + blk_mq_quiesce_queue(bdev->bd_queue); >> + >> + if (bdev->bd_filter) >> + ret = -EALREADY; >> + else >> + bdev->bd_filter = flt; >> + >> + blk_mq_unquiesce_queue(bdev->bd_queue); >> + blk_mq_unfreeze_queue(bdev->bd_queue); >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(bdev_filter_attach); >> + >> +/** >> + * bdev_filter_detach - Detach the filter from the block device. >> + * @bdev: >> + * Block device. >> + * >> + * The filter should be added using the bdev_filter_attach() function. >> + * >> + * Return: 0 if succeeded, or -ENOENT if the filter was not found. >> + */ >> +int bdev_filter_detach(struct block_device *bdev) >> +{ >> + int ret = 0; >> + struct bdev_filter *flt = NULL; >> + >> + blk_mq_freeze_queue(bdev->bd_queue); >> + blk_mq_quiesce_queue(bdev->bd_queue); >> + >> + flt = bdev->bd_filter; >> + if (flt) >> + bdev->bd_filter = NULL; >> + else >> + ret = -ENOENT; >> + >> + blk_mq_unquiesce_queue(bdev->bd_queue); >> + blk_mq_unfreeze_queue(bdev->bd_queue); >> + >> + if (flt) >> + bdev_filter_put(flt); >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(bdev_filter_detach); > What about bio-based devices? (DM, MD, etc) > > DM uses freeze_bdev() and thaw_bdev(), seems like you're missing some > work here. Thanks, Mike. We are planning to add a freeze_bdev() function call in bdev_filter_attach(). But for the bdev_filter_detach() function, it doesn't seem to make sense. I think enough to call blk_mq_freeze_queue(). As Fabio already wrote, I use a public repository on github to work with the patch: https://github.com/SergeiShtepa/linux/commits/blksnap-master The current state can be viewed there. Feedback is welcome as usual. > >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 5487912befe8..284b295a7b23 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -678,9 +678,24 @@ void submit_bio_noacct_nocheck(struct bio *bio) >> * to collect a list of requests submited by a ->submit_bio method while >> * it is active, and then process them after it returned. >> */ >> - if (current->bio_list) >> + if (current->bio_list) { >> bio_list_add(¤t->bio_list[0], bio); >> - else if (!bio->bi_bdev->bd_disk->fops->submit_bio) >> + return; >> + } >> + >> + if (bio->bi_bdev->bd_filter && !bio_flagged(bio, BIO_FILTERED)) { > Shouldn't this be: if (unlikely(...))? > > But that obviously assumes a fair amount about the only consumer > (temporary filter that lasts as long as it takes to do a backup). Yes, at the moment the code is being created so that only one filter is possible. In the summer, I offered a more complex solution, in which there were altitudes. See: https://lore.kernel.org/linux-block/1655135593-1900-2-git-send-email-sergei.shtepa@xxxxxxxxx/ But this is redundant code for this task at the moment, since only one filter is offered now. I think it will be possible to implement something similar later. > >> + bool pass; >> + >> + pass = bio->bi_bdev->bd_filter->fops->submit_bio_cb(bio); >> + bio_set_flag(bio, BIO_FILTERED); >> + if (!pass) { >> + bio->bi_status = BLK_STS_OK; >> + bio_endio(bio); >> + return; >> + } >> + } >> + >> + if (!bio->bi_bdev->bd_disk->fops->submit_bio) >> __submit_bio_noacct_mq(bio); >> else >> __submit_bio_noacct(bio); > And you currently don't allow for blkfilter to be involved if a bio > recurses (which is how bio splitting works now). Not sure it > matters, just mentioning it... > > But taking a step back, in the hopes of stepping out of your way: > > Myself and others on the DM team (past and present) have always hoped > all block devices could have the flexibility of DM. It was that hope > that caused my frustration when I first saw your blkfilter approach. > > But I was too idealistic that a byproduct of your efforts > (blk-interposer before and blkfilter now) would usher in _all_ block > devices being able to comprehensively change their identity (and IO > processing) like DM enjoys. > > DM showcases all the extra code needed to achieve its extreme IO > remapping and stacking flexibilty -- I don't yet see a way to distill > the essence of what DM achieves without imposing too much on all block > core. > > So I do think blkfilter is a pragmatic way to achieve your goals. > > Mike >