"The on-going re-work of btrfs zone support for instance now relies 100% on zone append being supported.... So the approach is: mandate zone append support for ZNS devices.... To allow other ZNS drives, an emulation similar to SCSI can be implemented, ... While on a HDD the performance penalty is minimal, it will likely be *significant* on a SSD." Wow. Well as I said, I don't know much about Linux but it sounds like the ongoing re-work of btrfs zone support mandating zone append should be revisited. The reality is there will be flavors of ZNS drives in the market that do not support Append. As many of you know, the ZRWA technical proposal is well under-way in NVMe ZNS WG. Ensuring that the entire Linux zone support ecosystem deliberately locks these devices out / or at best consigns them to a severely performance-penalized path, especially given the MULTIPLE statements that have been made in the NVMe ZNS WG by multiple companies regarding the use cases for which Zone Append is an absolute disaster (not my words), seems pretty darn inappropriate. -----Original Message----- From: linux-nvme [mailto:linux-nvme-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Damien Le Moal Sent: Tuesday, June 16, 2020 5:36 AM To: Javier González; Matias Bjørling Cc: Jens Axboe; Niklas Cassel; Ajay Joshi; Sagi Grimberg; Keith Busch; Dmitry Fomichev; Aravind Ramesh; linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; Hans Holmberg; Christoph Hellwig; Matias Bjorling Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces On 2020/06/16 21:24, Javier González wrote: > On 16.06.2020 14:06, Matias Bjørling wrote: >> On 16/06/2020 14.00, Javier González wrote: >>> On 16.06.2020 13:18, Matias Bjørling wrote: >>>> On 16/06/2020 12.41, Javier González wrote: >>>>> On 16.06.2020 08:34, Keith Busch wrote: >>>>>> Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined >>>>>> in NVM Express TP4053. Zoned namespaces are discovered based on their >>>>>> Command Set Identifier reported in the namespaces Namespace >>>>>> Identification Descriptor list. A successfully discovered Zoned >>>>>> Namespace will be registered with the block layer as a host managed >>>>>> zoned block device with Zone Append command support. A namespace that >>>>>> does not support append is not supported by the driver. >>>>> >>>>> Why are we enforcing the append command? Append is optional on the >>>>> current ZNS specification, so we should not make this mandatory in the >>>>> implementation. See specifics below. >>> >>>> >>>> There is already general support in the kernel for the zone append >>>> command. Feel free to submit patches to emulate the support. It is >>>> outside the scope of this patchset. >>>> >>> >>> It is fine that the kernel supports append, but the ZNS specification >>> does not impose the implementation for append, so the driver should not >>> do that either. >>> >>> ZNS SSDs that choose to leave append as a non-implemented optional >>> command should not rely on emulated SW support, specially when >>> traditional writes work very fine for a large part of current ZNS use >>> cases. >>> >>> Please, remove this virtual constraint. >> >> The Zone Append command is mandatory for zoned block devices. Please >> see https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_818709_&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=kkJ8bJpiTjKS9PoobDPHTf11agXKNUpcw5AsIEyewZk&e= for the background. > > I do not see anywhere in the block layer that append is mandatory for > zoned devices. Append is emulated on ZBC, but beyond that there is no > mandatory bits. Please explain. This is to allow a single write IO path for all types of zoned block device for higher layers, e.g file systems. The on-going re-work of btrfs zone support for instance now relies 100% on zone append being supported. That significantly simplifies the file system support and more importantly remove the need for locking around block allocation and BIO issuing, allowing to preserve a fully asynchronous write path that can include workqueues for efficient CPU usage of things like encryption and compression. Without zone append, file system would either (1) have to reject these drives that do not support zone append, or (2) implement 2 different write IO path (slower regular write and zone append). None of these options are ideal, to say the least. So the approach is: mandate zone append support for ZNS devices. To allow other ZNS drives, an emulation similar to SCSI can be implemented, with that emulation ideally combined to work for both types of drives if possible. And note that this emulation would require the drive to be operated with mq-deadline to enable zone write locking for preserving write command order. While on a HDD the performance penalty is minimal, it will likely be significant on a SSD. > >> Please submitpatches if you want to have support for ZNS devices that >> does not implement the Zone Append command. It is outside the scope >> of this patchset. > > That we will. > > > _______________________________________________ > linux-nvme mailing list > linux-nvme@xxxxxxxxxxxxxxxxxxx > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=HeBnGkcBM5OqESkW8yYYi2KtvVwbdamrbd_X5PgGKBk&e= > -- Damien Le Moal Western Digital Research _______________________________________________ linux-nvme mailing list linux-nvme@xxxxxxxxxxxxxxxxxxx https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=HeBnGkcBM5OqESkW8yYYi2KtvVwbdamrbd_X5PgGKBk&e=