On 9/23/22 04:37, Mike Snitzer wrote: > On Wed, Sep 21 2022 at 7:55P -0400, > Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote: > >> On 9/22/22 02:27, Mike Snitzer wrote: >>> On Tue, Sep 20 2022 at 5:11P -0400, >>> Pankaj Raghav <p.raghav@xxxxxxxxxxx> wrote: >>> >>>> - Background and Motivation: >>>> >>>> The zone storage implementation in Linux, introduced since v4.10, first >>>> targetted SMR drives which have a power of 2 (po2) zone size alignment >>>> requirement. The po2 zone size was further imposed implicitly by the >>>> block layer's blk_queue_chunk_sectors(), used to prevent IO merging >>>> across chunks beyond the specified size, since v3.16 through commit >>>> 762380ad9322 ("block: add notion of a chunk size for request merging"). >>>> But this same general block layer po2 requirement for blk_queue_chunk_sectors() >>>> was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors' >>>> to be non-power-of-2"). >>>> >>>> NAND, which is the media used in newer zoned storage devices, does not >>>> naturally align to po2. In these devices, zone capacity(cap) is not the >>>> same as the po2 zone size. When the zone cap != zone size, then unmapped >>>> LBAs are introduced to cover the space between the zone cap and zone size. >>>> po2 requirement does not make sense for these type of zone storage devices. >>>> This patch series aims to remove these unmapped LBAs for zoned devices when >>>> zone cap is npo2. This is done by relaxing the po2 zone size constraint >>>> in the kernel and allowing zoned device with npo2 zone sizes if zone cap >>>> == zone size. >>>> >>>> Removing the po2 requirement from zone storage should be possible >>>> now provided that no userspace regression and no performance regressions are >>>> introduced. Stop-gap patches have been already merged into f2fs-tools to >>>> proactively not allow npo2 zone sizes until proper support is added [1]. >>>> >>>> There were two efforts previously to add support to npo2 devices: 1) via >>>> device level emulation [2] but that was rejected with a final conclusion >>>> to add support for non po2 zoned device in the complete stack[3] 2) >>>> adding support to the complete stack by removing the constraint in the >>>> block layer and NVMe layer with support to btrfs, zonefs, etc which was >>>> rejected with a conclusion to add a dm target for FS support [0] >>>> to reduce the regression impact. >>>> >>>> This series adds support to npo2 zoned devices in the block and nvme >>>> layer and a new **dm target** is added: dm-po2zoned-target. This new >>>> target will be initially used for filesystems such as btrfs and >>>> f2fs until native npo2 zone support is added. >>> >>> As this patchset nears the point of being "ready for merge" and DM's >>> "zoned" oriented targets are multiplying, I need to understand: where >>> are we collectively going? How long are we expecting to support the >>> "stop-gap zoned storage" layers we've constructed? >>> >>> I know https://zonedstorage.io/docs/introduction exists... but it >>> _seems_ stale given the emergence of ZNS and new permutations of zoned >>> hardware. Maybe that isn't quite fair (it does cover A LOT!) but I'm >>> still left wanting (e.g. "bring it all home for me!")... >>> >>> Damien, as the most "zoned storage" oriented engineer I know, can you >>> please kick things off by shedding light on where Linux is now, and >>> where it's going, for "zoned storage"? >> >> Let me first start with what we have seen so far with deployments in the >> field. > > <snip> > > Thanks for all your insights on zoned storage, very appreciated! > >>> In addition, it was my understanding that WDC had yet another zoned DM >>> target called "dm-zap" that is for ZNS based devices... It's all a bit >>> messy in my head (that's on me for not keeping up, but I think we need >>> a recap!) >> >> Since the ZNS specification does not define conventional zones, dm-zoned >> cannot be used as a standalone DM target (read: single block device) with >> NVMe zoned block devices. Furthermore, due to its block mapping scheme, >> dm-zoned does not support devices with zones that have a capacity lower >> than the zone size. So ZNS is really a big *no* for dm-zoned. dm-zap is a >> prototype and in a nutshell is the equivalent of dm-zoned for ZNS. dm-zap >> can deal with the smaller zone capacity and does not require conventional >> zones. We are not trying to push for dm-zap to be merged for now as we are >> still evaluating its potential use cases. We also have a different but >> functionally equivalent approach implemented as a block device driver that >> we are evaluating internally. >> >> Given the above mentioned usage pattern we have seen so far for zoned >> storage, it is not yet clear if something like dm-zap for ZNS is needed >> beside some niche use cases. > > OK, good to know. I do think dm-zoned should be trained to _not_ > allow use with ZNS NVMe devices (maybe that is in place and I just > missed it?). Because there is some confusion with at least one > customer that is asserting dm-zoned is somehow enabling them to use > ZNS NVMe devices! dm-zoned checks for conventional zones and also that all zones have a zone capacity that is equal to the zone size. The first point puts ZNS out but a second regular drive can be used to emulate conventional zones. However, the second point (zone cap < zone size) is pretty much a given with ZNS and so rules it out. If anything, we should also add a check on the max number of active zones, which is also a limitation that ZNS drives have, unlike SMR drives. Since dm-zoned does not handle active zones at all, any drive with a limit should be excluded. I will send patches for that. > > Maybe they somehow don't _need_ conventional zones (writes are handled > by some other layer? and dm-zoned access is confined to read only)!? > And might they also be using ZNS NVMe devices to do _not_ have a > zone capacity lower than the zone size? It is a possibility. Indeed, if the ZNS drive has: 1) zone capacity equal to zone size 2) a second regular drive is used to emulate conventional zones 3) no limit on the max number of active zones Then dm-zoned will work just fine. But again, I seriously doubt that point (3) holds. And we should check that upfront in dm-zoned ctr. > Or maybe they are mistaken and we should ask more specific questions > of them? Getting the exact drive characteristics (zone size, capacity and zone resource limits) will tell you if dm-zoned can work or not. -- Damien Le Moal Western Digital Research