Re: [PATCH v2 02/21] block, blkfilter: Block Device Filtering Mechanism

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/1/23 00:58, Mike Snitzer wrote:
> Subject:
> Re: [PATCH v2 02/21] block, blkfilter: Block Device Filtering Mechanism
> From:
> Mike Snitzer <snitzer@xxxxxxxxxx>
> Date:
> 2/1/23, 00:58
> 
> To:
> Sergei Shtepa <sergei.shtepa@xxxxxxxxx>
> CC:
> axboe@xxxxxxxxx, corbet@xxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-doc@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, dm-devel@xxxxxxxxxx
> 
> 
> On Fri, Dec 09 2022 at  9:23P -0500,
> Sergei Shtepa <sergei.shtepa@xxxxxxxxx> wrote:
> 
>> Allows to attach block device filters to the block devices. Kernel
>> modules can use this functionality to extend the capabilities of the
>> block layer.
>>
>> Signed-off-by: Sergei Shtepa <sergei.shtepa@xxxxxxxxx>
>> ---
>>  block/bdev.c              | 70 ++++++++++++++++++++++++++++++++++++++
>>  block/blk-core.c          | 19 +++++++++--
>>  include/linux/blk_types.h |  2 ++
>>  include/linux/blkdev.h    | 71 +++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 160 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/bdev.c b/block/bdev.c
>> index d699ecdb3260..b820178824b2 100644
>> --- a/block/bdev.c
>> +++ b/block/bdev.c
>> @@ -427,6 +427,7 @@ static void init_once(void *data)
>>  
>>  static void bdev_evict_inode(struct inode *inode)
>>  {
>> +	bdev_filter_detach(I_BDEV(inode));
>>  	truncate_inode_pages_final(&inode->i_data);
>>  	invalidate_inode_buffers(inode); /* is it needed here? */
>>  	clear_inode(inode);
>> @@ -502,6 +503,7 @@ struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
>>  		return NULL;
>>  	}
>>  	bdev->bd_disk = disk;
>> +	bdev->bd_filter = NULL;
>>  	return bdev;
>>  }
>>  
>> @@ -1092,3 +1094,71 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>>  
>>  	blkdev_put_no_open(bdev);
>>  }
>> +
>> +/**
>> + * bdev_filter_attach - Attach the filter to the original block device.
>> + * @bdev:
>> + *	Block device.
>> + * @flt:
>> + *	Filter that needs to be attached to the block device.
>> + *
>> + * Before adding a filter, it is necessary to initialize &struct bdev_filter
>> + * using a bdev_filter_init() function.
>> + *
>> + * The bdev_filter_detach() function allows to detach the filter from the block
>> + * device.
>> + *
>> + * Return: 0 if succeeded, or -EALREADY if the filter already exists.
>> + */
>> +int bdev_filter_attach(struct block_device *bdev,
>> +				     struct bdev_filter *flt)
>> +{
>> +	int ret = 0;
>> +
>> +	blk_mq_freeze_queue(bdev->bd_queue);
>> +	blk_mq_quiesce_queue(bdev->bd_queue);
>> +
>> +	if (bdev->bd_filter)
>> +		ret = -EALREADY;
>> +	else
>> +		bdev->bd_filter = flt;
>> +
>> +	blk_mq_unquiesce_queue(bdev->bd_queue);
>> +	blk_mq_unfreeze_queue(bdev->bd_queue);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(bdev_filter_attach);
>> +
>> +/**
>> + * bdev_filter_detach - Detach the filter from the block device.
>> + * @bdev:
>> + *	Block device.
>> + *
>> + * The filter should be added using the bdev_filter_attach() function.
>> + *
>> + * Return: 0 if succeeded, or -ENOENT if the filter was not found.
>> + */
>> +int bdev_filter_detach(struct block_device *bdev)
>> +{
>> +	int ret = 0;
>> +	struct bdev_filter *flt = NULL;
>> +
>> +	blk_mq_freeze_queue(bdev->bd_queue);
>> +	blk_mq_quiesce_queue(bdev->bd_queue);
>> +
>> +	flt = bdev->bd_filter;
>> +	if (flt)
>> +		bdev->bd_filter = NULL;
>> +	else
>> +		ret = -ENOENT;
>> +
>> +	blk_mq_unquiesce_queue(bdev->bd_queue);
>> +	blk_mq_unfreeze_queue(bdev->bd_queue);
>> +
>> +	if (flt)
>> +		bdev_filter_put(flt);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(bdev_filter_detach);
> What about bio-based devices? (DM, MD, etc)
> 
> DM uses freeze_bdev() and thaw_bdev(), seems like you're missing some
> work here.

Thanks, Mike.

We are planning to add a freeze_bdev() function call in bdev_filter_attach().
But for the bdev_filter_detach() function, it doesn't seem to make sense.
I think enough to call blk_mq_freeze_queue().

As Fabio already wrote, I use a public repository on github to work with
the patch: https://github.com/SergeiShtepa/linux/commits/blksnap-master
The current state can be viewed there. Feedback is welcome as usual.

> 
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 5487912befe8..284b295a7b23 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -678,9 +678,24 @@ void submit_bio_noacct_nocheck(struct bio *bio)
>>  	 * to collect a list of requests submited by a ->submit_bio method while
>>  	 * it is active, and then process them after it returned.
>>  	 */
>> -	if (current->bio_list)
>> +	if (current->bio_list) {
>>  		bio_list_add(&current->bio_list[0], bio);
>> -	else if (!bio->bi_bdev->bd_disk->fops->submit_bio)
>> +		return;
>> +	}
>> +
>> +	if (bio->bi_bdev->bd_filter && !bio_flagged(bio, BIO_FILTERED)) {
> Shouldn't this be: if (unlikely(...))?
> 
> But that obviously assumes a fair amount about the only consumer
> (temporary filter that lasts as long as it takes to do a backup).

Yes, at the moment the code is being created so that only one filter
is possible. In the summer, I offered a more complex solution, in which
there were altitudes. See:
https://lore.kernel.org/linux-block/1655135593-1900-2-git-send-email-sergei.shtepa@xxxxxxxxx/
But this is redundant code for this task at the moment, since only
one filter is offered now. I think it will be possible to implement
something similar later.

> 
>> +		bool pass;
>> +
>> +		pass = bio->bi_bdev->bd_filter->fops->submit_bio_cb(bio);
>> +		bio_set_flag(bio, BIO_FILTERED);
>> +		if (!pass) {
>> +			bio->bi_status = BLK_STS_OK;
>> +			bio_endio(bio);
>> +			return;
>> +		}
>> +	}
>> +
>> +	if (!bio->bi_bdev->bd_disk->fops->submit_bio)
>>  		__submit_bio_noacct_mq(bio);
>>  	else
>>  		__submit_bio_noacct(bio);
> And you currently don't allow for blkfilter to be involved if a bio
> recurses (which is how bio splitting works now).  Not sure it
> matters, just mentioning it...
> 
> But taking a step back, in the hopes of stepping out of your way:
> 
> Myself and others on the DM team (past and present) have always hoped
> all block devices could have the flexibility of DM. It was that hope
> that caused my frustration when I first saw your blkfilter approach.
> 
> But I was too idealistic that a byproduct of your efforts
> (blk-interposer before and blkfilter now) would usher in _all_ block
> devices being able to comprehensively change their identity (and IO
> processing) like DM enjoys.
> 
> DM showcases all the extra code needed to achieve its extreme IO
> remapping and stacking flexibilty -- I don't yet see a way to distill
> the essence of what DM achieves without imposing too much on all block
> core.
> 
> So I do think blkfilter is a pragmatic way to achieve your goals.
> 
> Mike
> 




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux