Re: [PATCH] dm: use gcd() to fix chunk_sectors limit stacking

JeffleXu <jefflexu@xxxxxxxxxxxxxxxxx> · Wed, 2 Dec 2020 11:57:35 +0800

Actually in terms of this issue, I think the dilemma here is that,
@chunk_sectors of dm device is mainly from two source.

One is that from the underlying devices, which is calculated into one
composed one in blk_stack_limits().

> commit 22ada802ede8 ("block: use lcm_not_zero() when stacking
> chunk_sectors") broke chunk_sectors limit stacking. chunk_sectors must
> reflect the most limited of all devices in the IO stack.
> 
> Otherwise malformed IO may result. E.g.: prior to this fix,
> ->chunk_sectors = lcm_not_zero(8, 128) would result in
> blk_max_size_offset() splitting IO at 128 sectors rather than the
> required more restrictive 8 sectors.

For this part, technically I can't agree that 'chunk_sectors must
reflect the most limited of all devices in the IO stack'. Even if the dm
device advertises chunk_sectors of 128K when the limits of two
underlying devices are 8K and 128K, and thus splitting is not done in dm
device phase, the underlying devices will split by themselves.

> @@ -547,7 +547,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  
>  	t->io_min = max(t->io_min, b->io_min);
>  	t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> -	t->chunk_sectors = lcm_not_zero(t->chunk_sectors, b->chunk_sectors);
> +
> +	/* Set non-power-of-2 compatible chunk_sectors boundary */
> +	if (b->chunk_sectors)
> +		t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);

This may introduces a regression. Suppose the @chunk_sectors limits of
two underlying devices are 8K and 128K, then @chunk_sectors of dm device
is 8K after the fix. So even when a 128K sized bio is actually
redirecting to the underlying device with 128K @chunk_sectors limit,
this 128K sized bio will actually split into 16 split bios, each 8K
sized。Obviously it is excessive split. And I think this is actually why
lcm_not_zero(a, b) is used originally.

The other one source is dm device itself. DM device can set @max_io_len
through ->io_hint(), and then set @chunk_sectors from @max_io_len.

This part is actually where 'chunk_sectors must reflect the most limited
of all devices in the IO stack' is true, and we have to apply the most
strict limitation here. This is actually what the following patch does.

On 12/2/20 11:38 AM, Jeffle Xu wrote:
> As it said in commit 7e7986f9d3ba ("block: use gcd() to fix
> chunk_sectors limit stacking"), chunk_sectors should reflect the most
> limited of all devices in the IO stack.
> 
> The previous commit only fixes block/blk-settings.c:blk_stack_limits(),
> while leaving dm.c:dm_calculate_queue_limits() unfixed.
> 
> Fixes: 882ec4e609c1 ("dm table: stack 'chunk_sectors' limit to account for target-specific splitting")
> cc: stable@xxxxxxxxxxxxxxx
> Reported-by: John Dorminy <jdorminy@xxxxxxxxxx>
> Reported-by: Bruce Johnston <bjohnsto@xxxxxxxxxx>
> Signed-off-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx>
> ---
>  drivers/md/dm-table.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index ce543b761be7..dcc0a27355d7 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -22,6 +22,7 @@
>  #include <linux/blk-mq.h>
>  #include <linux/mount.h>
>  #include <linux/dax.h>
> +#include <linux/gcd.h>
>  
>  #define DM_MSG_PREFIX "table"
>  
> @@ -1457,7 +1458,7 @@ int dm_calculate_queue_limits(struct dm_table *table,
>  
>  		/* Stack chunk_sectors if target-specific splitting is required */
>  		if (ti->max_io_len)
> -			ti_limits.chunk_sectors = lcm_not_zero(ti->max_io_len,
> +			ti_limits.chunk_sectors = gcd(ti->max_io_len,
>  							       ti_limits.chunk_sectors);
>  		/* Set I/O hints portion of queue limits */
>  		if (ti->type->io_hints)
> 

-- 
Thanks,
Jeffle