On Thu, May 23, 2024 at 09:52:40AM +0800, Ming Lei wrote: > On Wed, May 22, 2024 at 12:48:59PM -0400, Mike Snitzer wrote: > > On Wed, May 22, 2024 at 04:24:58PM +0200, Christoph Hellwig wrote: > > > On Tue, May 21, 2024 at 10:51:17PM -0400, Mike Snitzer wrote: > > > > Otherwise, blk_validate_limits() will throw-away the max_sectors that > > > > was stacked from underlying device(s). In doing so it can set a > > > > max_sectors limit that violates underlying device limits. > > > > > > Hmm, yes it sort of is "throwing the limit away", but it really > > > recalculates it from max_hw_sectors, max_dev_sectors and user_max_sectors. > > > > Yes, but it needs to do that recalculation at each level of a stacked > > device. And then we need to combine them via blk_stack_limits() -- as > > is done with the various limits stacking loops in > > drivers/md/dm-table.c:dm_calculate_queue_limits > > This way looks one stacking specific requirement, just wondering why not > put the logic into blk_validate_limits() by passing 'stacking' parameter? > Then raid can benefit from it too. Sure, we could elevate it to blk_validate_limits (and callers) but adding a 'stacking' parameter is more intrusive on an API level. Best to just update blk_set_stacking_limits() to set a new 'stacking' flag in struct queue_limits, and update blk_stack_limits() to stack that flag up. I've verified this commit to work and have staged it in linux-next via linux-dm.git's 'for-next', see: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=cedc03d697ff255dd5b600146521434e2e921815 Jens (and obviously: Christoph, Ming and others), I'm happy to send this to Linus tomorrow morning if you could please provide your Reviewed-by or Acked-by. I'd prefer to keep the intermediate DM fix just to "show the work and testing". Thanks, Mike >From cedc03d697ff255dd5b600146521434e2e921815 Mon Sep 17 00:00:00 2001 From: Mike Snitzer <snitzer@xxxxxxxxxx> Date: Thu, 23 May 2024 11:19:29 -0400 Subject: [PATCH] block: fix blk_validate_limits() to properly handle stacked devices For the benefit of other stacking block drivers, e.g. MD, elevate the DM fix from commit 0ead1c8e8e48 ("dm: retain stacked max_sectors when setting queue_limits") to block core. Switches to using a bool bitfield in struct queue_limits (for old member 'zoned' and new member 'stacking') to not grow that struct. Suggested-by: Ming Lei <ming.lei@xxxxxxxxxx> Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> --- block/blk-settings.c | 30 ++++++++++++++++++++++++++++-- drivers/md/dm-table.c | 8 -------- include/linux/blkdev.h | 3 ++- 3 files changed, 30 insertions(+), 11 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index cdbaef159c4b..24c799072f6c 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -35,6 +35,7 @@ EXPORT_SYMBOL_GPL(blk_queue_rq_timeout); void blk_set_stacking_limits(struct queue_limits *lim) { memset(lim, 0, sizeof(*lim)); + lim->stacking = true; lim->logical_block_size = SECTOR_SIZE; lim->physical_block_size = SECTOR_SIZE; lim->io_min = SECTOR_SIZE; @@ -103,7 +104,7 @@ static int blk_validate_zoned_limits(struct queue_limits *lim) */ static int blk_validate_limits(struct queue_limits *lim) { - unsigned int max_hw_sectors; + unsigned int max_hw_sectors, stacked_max_hw_sectors = 0; /* * Unless otherwise specified, default to 512 byte logical blocks and a @@ -121,6 +122,23 @@ static int blk_validate_limits(struct queue_limits *lim) if (lim->io_min < lim->physical_block_size) lim->io_min = lim->physical_block_size; + + /* + * For stacked block devices, don't throw-away stacked max_sectors. + */ + if (lim->stacking && lim->max_hw_sectors) { + /* + * lim->max_sectors and lim->max_hw_sectors were already + * validated, relative underlying device(s) in this stacked + * block device. + */ + stacked_max_hw_sectors = lim->max_hw_sectors; + /* + * Impose stacked max_sectors as upper-bound for code below. + */ + lim->max_hw_sectors = lim->max_sectors; + } + /* * max_hw_sectors has a somewhat weird default for historical reason, * but driver really should set their own instead of relying on this @@ -155,6 +173,11 @@ static int blk_validate_limits(struct queue_limits *lim) lim->max_sectors = round_down(lim->max_sectors, lim->logical_block_size >> SECTOR_SHIFT); + if (stacked_max_hw_sectors) { + /* Restore previously validated stacked max_hw_sectors */ + lim->max_hw_sectors = max_hw_sectors; + } + /* * Random default for the maximum number of segments. Driver should not * rely on this and set their own. @@ -881,11 +904,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->max_secure_erase_sectors); t->zone_write_granularity = max(t->zone_write_granularity, b->zone_write_granularity); - t->zoned = max(t->zoned, b->zoned); + t->zoned |= b->zoned; if (!t->zoned) { t->zone_write_granularity = 0; t->max_zone_append_sectors = 0; } + + t->stacking |= b->stacking; + return ret; } EXPORT_SYMBOL(blk_stack_limits); diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 6463b4afeaa4..88114719fe18 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -1961,7 +1961,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, struct queue_limits *limits) { bool wc = false, fua = false; - unsigned int max_hw_sectors; int r; if (dm_table_supports_nowait(t)) @@ -1982,16 +1981,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, if (!dm_table_supports_secure_erase(t)) limits->max_secure_erase_sectors = 0; - /* Don't allow queue_limits_set() to throw-away stacked max_sectors */ - max_hw_sectors = limits->max_hw_sectors; - limits->max_hw_sectors = limits->max_sectors; r = queue_limits_set(q, limits); if (r) return r; - /* Restore stacked max_hw_sectors */ - mutex_lock(&q->limits_lock); - limits->max_hw_sectors = max_hw_sectors; - mutex_unlock(&q->limits_lock); if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) { wc = true; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c3e8f7cf96be..ad1b00e5cc3e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -307,7 +307,8 @@ struct queue_limits { unsigned char misaligned; unsigned char discard_misaligned; unsigned char raid_partial_stripes_expensive; - bool zoned; + bool zoned:1; + bool stacking:1; unsigned int max_open_zones; unsigned int max_active_zones; -- 2.44.0