Re: [PATCH v4 1/3] block: Improve checks on zone resource limits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 06, 2024 at 09:06:58AM +0900, Damien Le Moal wrote:
> On 6/6/24 2:25 AM, Niklas Cassel wrote:
> > On Wed, Jun 05, 2024 at 04:51:42PM +0900, Damien Le Moal wrote:
> 
> The problem you are raising is the reliability of the limits themselves, and
> for NVMe ZNS, given that MOR/MAR are not defined per namespace, we are in the
> same situation as with DM devices sharing the same zoned block dev through
> different targets: even if the user respects the limits, write errors may
> happen due to the backing dev limits (or controller limits for ZNS) being
> exceeded. Nothing much we can do to easily deal with this right now. We would
> need to constantly track zone states and implement a software driven zone state
> machine checking the limits all the time to actually provide guarantees.
> 
> > Since AFAICT, this also means that we will expose 0 to sysfs
> > instead of the value that the device reported.
> 
> Yes. But the value reported by the device is for the whole controller. The
> sysfs attributes are for the block device == namespace.

The limits are defined in the I/O Command Set Specific Identify Namespace
Data Structure for the Zoned Namespace Command Set, so they are per NS,
otherwise they would have been defined in the I/O Command Set Specific
Identify Controller Data Structure for the Zoned Namespace Command Set.


> 
> > Perhaps we should only do this optimization if:
> > - the device is not ZNS, or
> > - the device is ZNS and does not support NS management, or
> > - the device is ZNS and supports NS management and implements TP4115
> >   (Zoned Namespace Resource Management supported bit is set, even if
> >    that TP does not seem to be part of a Ratified ZNS version yet...)
> 
> Right now, this all works the same way for DM and nvme zns, so I think this is
> all good. If anything, we should probably add a warning in the nvme driver
> about the potentially unreliable moz/moz limits if we see a ZNS device with
> multiple zoned namespaces.

Well, it is only a problem for ZNS devices with NS management.

If there are two ZNS namespaces on the device, and the device does not
support NS management, the device vendor would have been seriously silly
to not allocate and set the limits in the I/O Command Set Specific Identify
Namespace Data Structure for the Zoned Namespace Command Set correctly.

But yes, this concern cannot be solved in disk_update_zone_resources(),
which operates on per gendisk (and there is one gendisk per namespace),
so not much this function can do. If we were to do something, it would
have to be done in the nvme driver.


Perhaps if the device is ZNS, and does support NS management, but does
not have the Zoned Namespace Resource Management supported bit is set,
divide the MAR/MOR values reported by each namespace by the number of
ZNS namespaces?


Kind regards,
Niklas




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux