On 2/5/24 19:06, Ming Lei wrote: > On Mon, Feb 05, 2024 at 11:41:04AM +0900, Damien Le Moal wrote: >> On 2/5/24 11:19, Ming Lei wrote: >>>>>> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, >>>>>> + struct bio *bio, unsigned int nr_segs) >>>>>> +{ >>>>>> + /* >>>>>> + * Keep a reference on the BIO request queue usage. This reference will >>>>>> + * be dropped either if the BIO is failed or after it is issued and >>>>>> + * completes. >>>>>> + */ >>>>>> + percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); >>>>> >>>>> It is fragile to get nested usage_counter, and same with grabbing/releasing it >>>>> from different contexts or even functions, and it could be much better to just >>>>> let block layer maintain it. >>>>> >>>>> From patch 23's change: >>>>> >>>>> + * Zoned block device information. Reads of this information must be >>>>> + * protected with blk_queue_enter() / blk_queue_exit(). Modifying this >>>>> >>>>> Anytime if there is in-flight bio, the block device is opened, so both gendisk and >>>>> request_queue are live, so not sure if this .q_usage_counter protection >>>>> is needed. >>>> >>>> Hannes also commented about this. Let me revisit this. >>> >>> I think only queue re-configuration(blk_revalidate_zone) requires the >>> queue usage counter. Otherwise, bdev open()/close() should work just >>> fine. >> >> I want to check FS case though. No clear if mounting FS that supports zone >> (btrfs) also uses bdev open ? > > I feel the following delta change might be cleaner and easily documented: > > - one IO takes single reference for both bio based and blk-mq, > - no drop & re-grab > - only grab extra reference for bio based > - two code paths share same pattern > > diff --git a/block/blk-core.c b/block/blk-core.c > index 9520ccab3050..118dd789beb5 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -597,6 +597,10 @@ static void __submit_bio(struct bio *bio) > > if (!bio->bi_bdev->bd_has_submit_bio) { > blk_mq_submit_bio(bio); > + } else if (bio_zone_write_plugging(bio)) { > + struct gendisk *disk = bio->bi_bdev->bd_disk; > + > + disk->fops->submit_bio(bio); > } else if (likely(bio_queue_enter(bio) == 0)) { > struct gendisk *disk = bio->bi_bdev->bd_disk; > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index f0fc61a3ec81..fc6d792747dc 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -3006,8 +3006,12 @@ void blk_mq_submit_bio(struct bio *bio) > if (blk_mq_attempt_bio_merge(q, bio, nr_segs)) > goto queue_exit; > > + /* > + * Grab one reference for plugged zoned write and it will be reused in > + * next real submission > + */ > if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs)) > - goto queue_exit; > + return; > > if (!rq) { > new_request: > diff --git a/block/blk-zoned.c b/block/blk-zoned.c > index f6d4f511b664..87abb3f7ef30 100644 > --- a/block/blk-zoned.c > +++ b/block/blk-zoned.c > @@ -514,7 +514,8 @@ static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, > * be dropped either if the BIO is failed or after it is issued and > * completes. > */ > - percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); > + if (bio->bi_bdev->bd_has_submit_bio) > + percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); > > /* > * The BIO is being plugged and thus will have to wait for the on-going > @@ -760,15 +761,10 @@ static void blk_zone_wplug_bio_work(struct work_struct *work) > > blk_zone_wplug_unlock(zwplug, flags); > > - /* > - * blk-mq devices will reuse the reference on the request queue usage > - * we took when the BIO was plugged, but the submission path for > - * BIO-based devices will not do that. So drop this reference here. > - */ > + submit_bio_noacct_nocheck(bio); > + > if (bio->bi_bdev->bd_has_submit_bio) > blk_queue_exit(bio->bi_bdev->bd_disk->queue); Hmm... As-is, this is a potential use-after-free of the bio. But I get the idea. This is indeed a little better. I will integrate this. > - > - submit_bio_noacct_nocheck(bio); > } > > static struct blk_zone_wplug *blk_zone_alloc_write_plugs(unsigned int nr_zones) > > Thanks, > Ming > > -- Damien Le Moal Western Digital Research