On Wed, Sep 21 2022 at 7:55P -0400, Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> wrote: > On 9/22/22 02:27, Mike Snitzer wrote: > > On Tue, Sep 20 2022 at 5:11P -0400, > > Pankaj Raghav <p.raghav@xxxxxxxxxxx> wrote: > > > >> - Background and Motivation: > >> > >> The zone storage implementation in Linux, introduced since v4.10, first > >> targetted SMR drives which have a power of 2 (po2) zone size alignment > >> requirement. The po2 zone size was further imposed implicitly by the > >> block layer's blk_queue_chunk_sectors(), used to prevent IO merging > >> across chunks beyond the specified size, since v3.16 through commit > >> 762380ad9322 ("block: add notion of a chunk size for request merging"). > >> But this same general block layer po2 requirement for blk_queue_chunk_sectors() > >> was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors' > >> to be non-power-of-2"). > >> > >> NAND, which is the media used in newer zoned storage devices, does not > >> naturally align to po2. In these devices, zone capacity(cap) is not the > >> same as the po2 zone size. When the zone cap != zone size, then unmapped > >> LBAs are introduced to cover the space between the zone cap and zone size. > >> po2 requirement does not make sense for these type of zone storage devices. > >> This patch series aims to remove these unmapped LBAs for zoned devices when > >> zone cap is npo2. This is done by relaxing the po2 zone size constraint > >> in the kernel and allowing zoned device with npo2 zone sizes if zone cap > >> == zone size. > >> > >> Removing the po2 requirement from zone storage should be possible > >> now provided that no userspace regression and no performance regressions are > >> introduced. Stop-gap patches have been already merged into f2fs-tools to > >> proactively not allow npo2 zone sizes until proper support is added [1]. > >> > >> There were two efforts previously to add support to npo2 devices: 1) via > >> device level emulation [2] but that was rejected with a final conclusion > >> to add support for non po2 zoned device in the complete stack[3] 2) > >> adding support to the complete stack by removing the constraint in the > >> block layer and NVMe layer with support to btrfs, zonefs, etc which was > >> rejected with a conclusion to add a dm target for FS support [0] > >> to reduce the regression impact. > >> > >> This series adds support to npo2 zoned devices in the block and nvme > >> layer and a new **dm target** is added: dm-po2zoned-target. This new > >> target will be initially used for filesystems such as btrfs and > >> f2fs until native npo2 zone support is added. > > > > As this patchset nears the point of being "ready for merge" and DM's > > "zoned" oriented targets are multiplying, I need to understand: where > > are we collectively going? How long are we expecting to support the > > "stop-gap zoned storage" layers we've constructed? > > > > I know https://zonedstorage.io/docs/introduction exists... but it > > _seems_ stale given the emergence of ZNS and new permutations of zoned > > hardware. Maybe that isn't quite fair (it does cover A LOT!) but I'm > > still left wanting (e.g. "bring it all home for me!")... > > > > Damien, as the most "zoned storage" oriented engineer I know, can you > > please kick things off by shedding light on where Linux is now, and > > where it's going, for "zoned storage"? > > Let me first start with what we have seen so far with deployments in the > field. <snip> Thanks for all your insights on zoned storage, very appreciated! > > In addition, it was my understanding that WDC had yet another zoned DM > > target called "dm-zap" that is for ZNS based devices... It's all a bit > > messy in my head (that's on me for not keeping up, but I think we need > > a recap!) > > Since the ZNS specification does not define conventional zones, dm-zoned > cannot be used as a standalone DM target (read: single block device) with > NVMe zoned block devices. Furthermore, due to its block mapping scheme, > dm-zoned does not support devices with zones that have a capacity lower > than the zone size. So ZNS is really a big *no* for dm-zoned. dm-zap is a > prototype and in a nutshell is the equivalent of dm-zoned for ZNS. dm-zap > can deal with the smaller zone capacity and does not require conventional > zones. We are not trying to push for dm-zap to be merged for now as we are > still evaluating its potential use cases. We also have a different but > functionally equivalent approach implemented as a block device driver that > we are evaluating internally. > > Given the above mentioned usage pattern we have seen so far for zoned > storage, it is not yet clear if something like dm-zap for ZNS is needed > beside some niche use cases. OK, good to know. I do think dm-zoned should be trained to _not_ allow use with ZNS NVMe devices (maybe that is in place and I just missed it?). Because there is some confusion with at least one customer that is asserting dm-zoned is somehow enabling them to use ZNS NVMe devices! Maybe they somehow don't _need_ conventional zones (writes are handled by some other layer? and dm-zoned access is confined to read only)!? And might they also be using ZNS NVMe devices to do _not_ have a zone capacity lower than the zone size? Or maybe they are mistaken and we should ask more specific questions of them? > > So please help me, and others, become more informed as quickly as > > possible! ;) > > I hope the above helps. If you want me to develop further any of the > points above, feel free to let me know. You've been extremely helpful, thanks!