Re: [LSF/MM/BPF TOPIC] Hybrid SMR HDDs / Zone Domains & Realms

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/2/23 08:34, Khazhy Kumykov wrote:
> HSMR HDDs are a type of SMR HDD that allow for a dynamic mixture of
> CMR and SMR zones, allowing users to convert regions of the disk
> between the two. The way this is implemented as specified by the SCSI
> ZAC-2 specification is there’s a set of “CMR” regions and “SMR”
> regions. These may be grouped into “realms” that may, as a group, be
> online or offline. Zone management can bring online a domain/zone and
> offline any corresponding domains/zones.
> 
> I’d like to discuss what path makes sense for supporting these
> devices, and also how to avoid potential issues specific to the “mixed
> CMR & SMR IO traffic” use case - particularly around latency due to
> potentially unneeded (from the perspective of an application) zone
> management commands.

Hard no on supporting these. See below.

> 
> Points of Discussion
> ====
> 
>  - There’s already support in the kernel for marking zones
> online/offline and cmr/smr, but this is fixed, not dynamic. Would
> there be hiccups with allowing zones to come online/offline while
> running?

No, there is no support for "marking" zones offline (or read only): transitions
into these states are not explicit due to any command execution, but determined
by the drive, and asynchronous as far as the host is concerned. There is support
for *detecting* offline zones though, so that FSes do not attempt to use these
dead zones. But that is more part of error processing than the regular IO path
because seeing offline zones is not expected, but rather, the result of a drive
going bad. HMSMR would essentially allow users to explicitly offline zones,
wreaking the IO path and potentially generating lots of IO errors.

So HSMR support should only be allowed (if it ever is) to be controlled by a
file system, not by the user. And if the user wants to do raw block device IOs,
then it can use passthrough commands to control the activation state of zones.

>  - There may be multiple CMR “zones” that are contiguous in LBA space.
> A benefit of HSMR disks is, to a certain extent, software which is
> designed for all-CMR disks can work similarly on a contiguous CMR area
> of the HSMR disk (modulo handling “resizes”). This may result in IO
> that can straddle two CMR “zones”. It’s not a problem for writes to
> span CMR zones, but it is for SMR zones, so this distinction is useful
> to have in the block layer.

Writes to CMR zones on regular host-managed SMR can straddle CMR zone boundaries
too (but not CMR-to-SMR boundary). We do not allow it because micro optimizing
for this case is not worth the overhead it introduces. So hard no on this.

>  - What makes sense as an interface for managing these types of
> not-quite CMR and not quite SMR disks? Some of the featureset overlaps
> with existing SMR support in blkdev_zone_mgmt_ioctl, so perhaps the
> additional conversion commands could be added there?

Passthrough commands. There are no kernel internal users of this, so I do not
see any need to add an interface for activate/deactivate zones. libzbc v6 is
coming soon with an API for zone domains/zone realms commands (already available
with the zone-domains branch of the source code).

>  - mitigating & limiting tail latency effects due to report zones
> commands / limiting “unnecessary” zone management calls.

There is no implicit zone management commands issued by the kernel, except the
one report zones done on disk scan/revalidate. Any zone management command is
explicit, asked for by the user or FS using the drive. So that is up to the user
to limit these to control the overhead.

In general, support for hybrid SMR (Zone domains / zone realms) is a hard no
from me. This feature set is a total nightmare to deal with in the kernel. It
opens a ton of corner cases that will require lots of checks in the hot path. We
definitely do not want that.

-- 
Damien Le Moal
Western Digital Research





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux