On 8/10/23 05:23, Bart Van Assche wrote: > Writes in sequential write required zones must happen at the write > pointer. Even if the submitter of the write commands (e.g. a filesystem) > submits writes for sequential write required zones in order, the block > layer or the storage controller may reorder these write commands. > > The zone locking mechanism in the mq-deadline I/O scheduler serializes > write commands for sequential zones. Some but not all storage controllers > require this serialization. Introduce a new request queue limit member > variable to allow block drivers to indicate that they preserve the order > of write commands and thus do not require serialization of writes per > zone. > > Cc: Damien Le Moal <dlemoal@xxxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Ming Lei <ming.lei@xxxxxxxxxx> > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > --- > block/blk-settings.c | 6 ++++++ > include/linux/blkdev.h | 1 + > 2 files changed, 7 insertions(+) > > diff --git a/block/blk-settings.c b/block/blk-settings.c > index 0046b447268f..b75c97971860 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -56,6 +56,7 @@ void blk_set_default_limits(struct queue_limits *lim) > lim->alignment_offset = 0; > lim->io_opt = 0; > lim->misaligned = 0; > + lim->use_zone_write_lock = true; > lim->zoned = BLK_ZONED_NONE; Given that the default for zoned is BLK_ZONED_NONE, having use_zone_write_lock default to true is strange. It would be better to set the default to false and have disk_set_zoned() set it to true if needed, with an additional argument to specify if it should be the case or not. E.g., for SMR drives, sd.c would call something like: disk_set_zoned(sdkp->disk, BLK_ZONED_HM, sdp->use_zone_write_lock); sd.c would default to sdp->use_zone_write_lock == true and UFS driver can set it to false. That would be cleaner I think. > lim->zone_write_granularity = 0; > lim->dma_alignment = 511; > @@ -685,6 +686,11 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > b->max_secure_erase_sectors); > t->zone_write_granularity = max(t->zone_write_granularity, > b->zone_write_granularity); > + /* > + * Whether or not the zone write lock should be used depends on the > + * bottom driver only. > + */ > + t->use_zone_write_lock = b->use_zone_write_lock; Given that DM bio targets do not have a scheduler and do not have a zone lock bitmap allocated, I do not think this is necessary at all. This can remain to false, thus in sync with the fact that there is no IO scheduler. > t->zoned = max(t->zoned, b->zoned); > return ret; > } > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 2f5371b8482c..deffa1f13444 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -316,6 +316,7 @@ struct queue_limits { > unsigned char misaligned; > unsigned char discard_misaligned; > unsigned char raid_partial_stripes_expensive; > + bool use_zone_write_lock; > enum blk_zoned_model zoned; > > /* -- Damien Le Moal Western Digital Research