On Mon, Feb 05, 2024 at 11:41:04AM +0900, Damien Le Moal wrote: > On 2/5/24 11:19, Ming Lei wrote: > >>>> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, > >>>> + struct bio *bio, unsigned int nr_segs) > >>>> +{ > >>>> + /* > >>>> + * Keep a reference on the BIO request queue usage. This reference will > >>>> + * be dropped either if the BIO is failed or after it is issued and > >>>> + * completes. > >>>> + */ > >>>> + percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); > >>> > >>> It is fragile to get nested usage_counter, and same with grabbing/releasing it > >>> from different contexts or even functions, and it could be much better to just > >>> let block layer maintain it. > >>> > >>> From patch 23's change: > >>> > >>> + * Zoned block device information. Reads of this information must be > >>> + * protected with blk_queue_enter() / blk_queue_exit(). Modifying this > >>> > >>> Anytime if there is in-flight bio, the block device is opened, so both gendisk and > >>> request_queue are live, so not sure if this .q_usage_counter protection > >>> is needed. > >> > >> Hannes also commented about this. Let me revisit this. > > > > I think only queue re-configuration(blk_revalidate_zone) requires the > > queue usage counter. Otherwise, bdev open()/close() should work just > > fine. > > I want to check FS case though. No clear if mounting FS that supports zone > (btrfs) also uses bdev open ? I feel the following delta change might be cleaner and easily documented: - one IO takes single reference for both bio based and blk-mq, - no drop & re-grab - only grab extra reference for bio based - two code paths share same pattern diff --git a/block/blk-core.c b/block/blk-core.c index 9520ccab3050..118dd789beb5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -597,6 +597,10 @@ static void __submit_bio(struct bio *bio) if (!bio->bi_bdev->bd_has_submit_bio) { blk_mq_submit_bio(bio); + } else if (bio_zone_write_plugging(bio)) { + struct gendisk *disk = bio->bi_bdev->bd_disk; + + disk->fops->submit_bio(bio); } else if (likely(bio_queue_enter(bio) == 0)) { struct gendisk *disk = bio->bi_bdev->bd_disk; diff --git a/block/blk-mq.c b/block/blk-mq.c index f0fc61a3ec81..fc6d792747dc 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3006,8 +3006,12 @@ void blk_mq_submit_bio(struct bio *bio) if (blk_mq_attempt_bio_merge(q, bio, nr_segs)) goto queue_exit; + /* + * Grab one reference for plugged zoned write and it will be reused in + * next real submission + */ if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs)) - goto queue_exit; + return; if (!rq) { new_request: diff --git a/block/blk-zoned.c b/block/blk-zoned.c index f6d4f511b664..87abb3f7ef30 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -514,7 +514,8 @@ static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, * be dropped either if the BIO is failed or after it is issued and * completes. */ - percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); + if (bio->bi_bdev->bd_has_submit_bio) + percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); /* * The BIO is being plugged and thus will have to wait for the on-going @@ -760,15 +761,10 @@ static void blk_zone_wplug_bio_work(struct work_struct *work) blk_zone_wplug_unlock(zwplug, flags); - /* - * blk-mq devices will reuse the reference on the request queue usage - * we took when the BIO was plugged, but the submission path for - * BIO-based devices will not do that. So drop this reference here. - */ + submit_bio_noacct_nocheck(bio); + if (bio->bi_bdev->bd_has_submit_bio) blk_queue_exit(bio->bi_bdev->bd_disk->queue); - - submit_bio_noacct_nocheck(bio); } static struct blk_zone_wplug *blk_zone_alloc_write_plugs(unsigned int nr_zones) Thanks, Ming