Re: [PATCH v4 1/3] block: Improve checks on zone resource limits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 05, 2024 at 04:51:42PM +0900, Damien Le Moal wrote:
> Make sure that the zone resource limits of a zoned block device are
> correct by checking that:
> (a) If the device has a max active zones limit, make sure that the max
>     open zones limit is lower than the max active zones limit.
> (b) If the device has zone resource limits, check that the limits
>     values are lower than the number of sequential zones of the device.
>     If it is not, assume that the zoned device has no limits by setting
>     the limits to 0.
> 
> For (a), a check is added to blk_validate_zoned_limits() and an error
> returned if the max open zones limit exceeds the value of the max active
> zone limit (if there is one).
> 
> For (b), given that we need the number of sequential zones of the zoned
> device, this check is added to disk_update_zone_resources(). This is
> safe to do as that function is executed with the disk queue frozen and
> the check executed after queue_limits_start_update() which takes the
> queue limits lock. Of note is that the early return in this function
> for zoned devices that do not use zone write plugging (e.g. DM devices
> using native zone append) is moved to after the new check and adjustment
> of the zone resource limits so that the check applies to any zoned
> device.
> 
> Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
> ---
>  block/blk-settings.c |  8 ++++++++
>  block/blk-zoned.c    | 20 ++++++++++++++++----
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index effeb9a639bb..474c709ea85b 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -80,6 +80,14 @@ static int blk_validate_zoned_limits(struct queue_limits *lim)
>  	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED)))
>  		return -EINVAL;
>  
> +	/*
> +	 * Given that active zones include open zones, the maximum number of
> +	 * open zones cannot be larger than the maximum numbber of active zones.

s/numbber/number/


> +	 */
> +	if (lim->max_active_zones &&
> +	    lim->max_open_zones > lim->max_active_zones)
> +		return -EINVAL;
> +
>  	if (lim->zone_write_granularity < lim->logical_block_size)
>  		lim->zone_write_granularity = lim->logical_block_size;
>  
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 52abebf56027..8f89705f5e1c 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -1647,8 +1647,22 @@ static int disk_update_zone_resources(struct gendisk *disk,
>  		return -ENODEV;
>  	}
>  
> +	lim = queue_limits_start_update(q);
> +
> +	/*
> +	 * Some devices can advertize zone resource limits that are larger than
> +	 * the number of sequential zones of the zoned block device, e.g. a
> +	 * small ZNS namespace. For such case, assume that the zoned device has
> +	 * no zone resource limits.
> +	 */
> +	nr_seq_zones = disk->nr_zones - nr_conv_zones;
> +	if (lim.max_open_zones >= nr_seq_zones)
> +		lim.max_open_zones = 0;
> +	if (lim.max_active_zones >= nr_seq_zones)
> +		lim.max_active_zones = 0;
> +

Is this really correct to transform to no limits?

The MAR and MOR limits are defined in the I/O Command Set Specific Identify
Namespace Data Structure for the Zoned Namespace Command Set.

However, the user has no ability to control these limits themselves
during a namespace management create ns, or for the format command
(and this still seems to be the case in the latest ZNS spec 1.1d).

Which means that the controller has no way of knowing the number of
resources to allocate to each namespace.

Some (all?) controllers will right now simply report the same MAR/MOR
for all namespaces.


So if I use the namespace management command to create two small
zoned namespaces, the number of sequential zones might be smaller
than the limits in both namespaces, but could together be exceeding
the limit.

How is ignoring the limit that we got from the device better than
actually exposing the limit which we got from the device?

Since AFAICT, this also means that we will expose 0 to sysfs
instead of the value that the device reported.



Perhaps we should only do this optimization if:
- the device is not ZNS, or
- the device is ZNS and does not support NS management, or
- the device is ZNS and supports NS management and implements TP4115
  (Zoned Namespace Resource Management supported bit is set, even if
   that TP does not seem to be part of a Ratified ZNS version yet...)


Kind regards,
Niklas




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux