On Mon, Feb 05, 2024 at 11:41:04AM +0900, Damien Le Moal wrote: > On 2/5/24 11:19, Ming Lei wrote: > >>>> +static inline void blk_zone_wplug_add_bio(struct blk_zone_wplug *zwplug, > >>>> + struct bio *bio, unsigned int nr_segs) > >>>> +{ > >>>> + /* > >>>> + * Keep a reference on the BIO request queue usage. This reference will > >>>> + * be dropped either if the BIO is failed or after it is issued and > >>>> + * completes. > >>>> + */ > >>>> + percpu_ref_get(&bio->bi_bdev->bd_disk->queue->q_usage_counter); > >>> > >>> It is fragile to get nested usage_counter, and same with grabbing/releasing it > >>> from different contexts or even functions, and it could be much better to just > >>> let block layer maintain it. > >>> > >>> From patch 23's change: > >>> > >>> + * Zoned block device information. Reads of this information must be > >>> + * protected with blk_queue_enter() / blk_queue_exit(). Modifying this > >>> > >>> Anytime if there is in-flight bio, the block device is opened, so both gendisk and > >>> request_queue are live, so not sure if this .q_usage_counter protection > >>> is needed. > >> > >> Hannes also commented about this. Let me revisit this. > > > > I think only queue re-configuration(blk_revalidate_zone) requires the > > queue usage counter. Otherwise, bdev open()/close() should work just > > fine. > > I want to check FS case though. No clear if mounting FS that supports zone > (btrfs) also uses bdev open ? btrfs '-O zoned' shouldn't be one exception: mount -O zoned /dev/ublkb0 /mnt b'blkdev_get_whole' b'bdev_open_by_dev' b'bdev_open_by_path' b'btrfs_scan_one_device' b'btrfs_get_tree' b'vfs_get_tree' b'fc_mount' b'btrfs_get_tree' b'vfs_get_tree' b'path_mount' b'__x64_sys_mount' b'do_syscall_64' b'entry_SYSCALL_64_after_hwframe' b'[unknown]' 1 > > >>>> + /* > >>>> + * blk-mq devices will reuse the reference on the request queue usage > >>>> + * we took when the BIO was plugged, but the submission path for > >>>> + * BIO-based devices will not do that. So drop this reference here. > >>>> + */ > >>>> + if (bio->bi_bdev->bd_has_submit_bio) > >>>> + blk_queue_exit(bio->bi_bdev->bd_disk->queue); > >>> > >>> But I don't see where this reference is reused for blk-mq in this patch, > >>> care to point it out? > >> > >> This patch modifies blk_mq_submit_bio() to add a "goto new_request" at the top > >> for any BIO flagged with BIO_FLAG_ZONE_WRITE_PLUGGING. So when a plugged BIO is > >> unplugged and submitted again, the reference that was taken in > >> blk_zone_wplug_add_bio() is reused for the new request for that BIO. > > > > OK, this reference reuse may be worse, because queue freeze can't prevent new > > write zoned bio from being submitted any more given only percpu_ref_get() is > > called for all write zoned bios. > > New BIOs (BIOS that have never been plugged yet) will go through the normal > blk_queue_enter() in blk_mq_submit_bio(), so they will be stopped there if > another context asked for a queue freeze. I do not think there is any issue with > how things are currently done (we tested that *a lot* with many different drives > and drive configs with DM etc). Reference counting as it is is OK, even though > it most likely be simplified. I am looking at that now. Indeed, new zoned write bio is still covered by blk-mq's queue reference, and the trick is just played on old plugged bio. Thanks, Ming