Re: [PATCH 06/26] block: Introduce zone write plugging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/5/24 19:06, Ming Lei wrote:
> On Mon, Feb 05, 2024 at 11:41:04AM +0900, Damien Le Moal wrote:
>> On 2/5/24 11:19, Ming Lei wrote:
>>>>>> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug,
>>>>>> +					  struct bio *bio, unsigned int nr_segs)
>>>>>> +{
>>>>>> +	/*
>>>>>> +	 * Keep a reference on the BIO request queue usage. This reference will
>>>>>> +	 * be dropped either if the BIO is failed or after it is issued and
>>>>>> +	 * completes.
>>>>>> +	 */
>>>>>> +	percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
>>>>>
>>>>> It is fragile to get nested usage_counter, and same with grabbing/releasing it
>>>>> from different contexts or even functions, and it could be much better to just
>>>>> let block layer maintain it.
>>>>>
>>>>> From patch 23's change:
>>>>>
>>>>> +	 * Zoned block device information. Reads of this information must be
>>>>> +	 * protected with blk_queue_enter() / blk_queue_exit(). Modifying this
>>>>>
>>>>> Anytime if there is in-flight bio, the block device is opened, so both gendisk and
>>>>> request_queue are live, so not sure if this .q_usage_counter protection
>>>>> is needed.
>>>>
>>>> Hannes also commented about this. Let me revisit this.
>>>
>>> I think only queue re-configuration(blk_revalidate_zone) requires the
>>> queue usage counter. Otherwise, bdev open()/close() should work just
>>> fine.
>>
>> I want to check FS case though. No clear if mounting FS that supports zone
>> (btrfs) also uses bdev open ?
> 
> I feel the following delta change might be cleaner and easily documented:
> 
> - one IO takes single reference for both bio based and blk-mq,
> - no drop & re-grab
> - only grab extra reference for bio based
> - two code paths share same pattern
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 9520ccab3050..118dd789beb5 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -597,6 +597,10 @@ static void __submit_bio(struct bio *bio)
>  
>  	if (!bio->bi_bdev->bd_has_submit_bio) {
>  		blk_mq_submit_bio(bio);
> +	} else if (bio_zone_write_plugging(bio)) {
> +		struct gendisk *disk = bio->bi_bdev->bd_disk;
> +
> +		disk->fops->submit_bio(bio);
>  	} else if (likely(bio_queue_enter(bio) == 0)) {
>  		struct gendisk *disk = bio->bi_bdev->bd_disk;
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f0fc61a3ec81..fc6d792747dc 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3006,8 +3006,12 @@ void blk_mq_submit_bio(struct bio *bio)
>  	if (blk_mq_attempt_bio_merge(q, bio, nr_segs))
>  		goto queue_exit;
>  
> +	/*
> +	 * Grab one reference for plugged zoned write and it will be reused in
> +	 * next real submission
> +	 */
>  	if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs))
> -		goto queue_exit;
> +		return;
>  
>  	if (!rq) {
>  new_request:
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index f6d4f511b664..87abb3f7ef30 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -514,7 +514,8 @@ static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug,
>  	 * be dropped either if the BIO is failed or after it is issued and
>  	 * completes.
>  	 */
> -	percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
> +	if (bio->bi_bdev->bd_has_submit_bio)
> +		percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter);
>  
>  	/*
>  	 * The BIO is being plugged and thus will have to wait for the on-going
> @@ -760,15 +761,10 @@ static void blk_zone_wplug_bio_work(struct work_struct *work)
>  
>  	blk_zone_wplug_unlock(zwplug, flags);
>  
> -	/*
> -	 * blk-mq devices will reuse the reference on the request queue usage
> -	 * we took when the BIO was plugged, but the submission path for
> -	 * BIO-based devices will not do that. So drop this reference here.
> -	 */
> +	submit_bio_noacct_nocheck(bio);
> +
>  	if (bio->bi_bdev->bd_has_submit_bio)
>  		blk_queue_exit(bio->bi_bdev->bd_disk->queue);

Hmm... As-is, this is a potential use-after-free of the bio. But I get the idea.
This is indeed a little better. I will integrate this.

> -
> -	submit_bio_noacct_nocheck(bio);
>  }
>  
>  static struct blk_zone_wplug *blk_zone_alloc_write_plugs(unsigned int nr_zones)
> 
> Thanks,
> Ming
> 
> 

-- 
Damien Le Moal
Western Digital Research





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux