Re: [PATCH 5/5] nvme: support for zoned namespaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/06/2020 15.08, Judy Brock wrote:
     "The on-going re-work of btrfs zone support for instance now relies 100% on zone append being supported.... So the approach is: mandate zone append support for ZNS devices.... To allow other ZNS drives, an emulation similar to SCSI can be implemented, ...  While on a HDD the  performance penalty is minimal, it will likely be *significant* on a SSD."

Wow. Well as I said, I don't know much about Linux but it sounds like the ongoing re-work of btrfs zone support mandating zone append should be revisited.
Feel free to go ahead and suggest an alternative solution that shows the same performance benefits.It is open-source, and if you can show and _implement_ a better solution. We will review it as any other contribution to the open-source eco-system.
The reality is there will be flavors of ZNS drives in the market that do not support Append.  As many of you know, the ZRWA technical proposal is well under-way in NVMe ZNS WG.

Ensuring that the entire Linux zone support ecosystem deliberately locks these devices out / or at best consigns them to a severely performance-penalized path, especially given the MULTIPLE statements that have been made in the NVMe ZNS WG by multiple companies regarding the use cases for which Zone Append is an absolute disaster (not my words), seems pretty darn inappropriate.

First a note: I appreciate you bringing up discussions that was made within the NVMe ZNS TG, but please note that those discussions happened in that forum that is under NDA. This is an open-source mailing list, and the content will be available online for many many years. Please refrain from discussing things that are not deemed public by the the NVMe board of directors.

On your statement, there is no deliberate locking out of devices , no more than a specific feature has not been implemented or that a device driver that is properitary to a company. Everyone is free to contribute to open-source. As Javier has previously pointed out, he intends to submit a patchset to add the necessary support for the zone append command API.






-----Original Message-----
From: linux-nvme [mailto:linux-nvme-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Damien Le Moal
Sent: Tuesday, June 16, 2020 5:36 AM
To: Javier González; Matias Bjørling
Cc: Jens Axboe; Niklas Cassel; Ajay Joshi; Sagi Grimberg; Keith Busch; Dmitry Fomichev; Aravind Ramesh; linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; Hans Holmberg; Christoph Hellwig; Matias Bjorling
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces

On 2020/06/16 21:24, Javier González wrote:
On 16.06.2020 14:06, Matias Bjørling wrote:
On 16/06/2020 14.00, Javier González wrote:
On 16.06.2020 13:18, Matias Bjørling wrote:
On 16/06/2020 12.41, Javier González wrote:
On 16.06.2020 08:34, Keith Busch wrote:
Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
in NVM Express TP4053. Zoned namespaces are discovered based on their
Command Set Identifier reported in the namespaces Namespace
Identification Descriptor list. A successfully discovered Zoned
Namespace will be registered with the block layer as a host managed
zoned block device with Zone Append command support. A namespace that
does not support append is not supported by the driver.
Why are we enforcing the append command? Append is optional on the
current ZNS specification, so we should not make this mandatory in the
implementation. See specifics below.
There is already general support in the kernel for the zone append
command. Feel free to submit patches to emulate the support. It is
outside the scope of this patchset.

It is fine that the kernel supports append, but the ZNS specification
does not impose the implementation for append, so the driver should not
do that either.

ZNS SSDs that choose to leave append as a non-implemented optional
command should not rely on emulated SW support, specially when
traditional writes work very fine for a large part of current ZNS use
cases.

Please, remove this virtual constraint.
The Zone Append command is mandatory for zoned block devices. Please
see https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_818709_&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=kkJ8bJpiTjKS9PoobDPHTf11agXKNUpcw5AsIEyewZk&e=  for the background.
I do not see anywhere in the block layer that append is mandatory for
zoned devices. Append is emulated on ZBC, but beyond that there is no
mandatory bits. Please explain.
This is to allow a single write IO path for all types of zoned block device for
higher layers, e.g file systems. The on-going re-work of btrfs zone support for
instance now relies 100% on zone append being supported. That significantly
simplifies the file system support and more importantly remove the need for
locking around block allocation and BIO issuing, allowing to preserve a fully
asynchronous write path that can include workqueues for efficient CPU usage of
things like encryption and compression. Without zone append, file system would
either (1) have to reject these drives that do not support zone append, or (2)
implement 2 different write IO path (slower regular write and zone append). None
of these options are ideal, to say the least.

So the approach is: mandate zone append support for ZNS devices. To allow other
ZNS drives, an emulation similar to SCSI can be implemented, with that emulation
ideally combined to work for both types of drives if possible. And note that
this emulation would require the drive to be operated with mq-deadline to enable
zone write locking for preserving write command order. While on a HDD the
performance penalty is minimal, it will likely be significant on a SSD.

Please submitpatches if you want to have support for ZNS devices that
does not implement the Zone Append command. It is outside the scope
of this patchset.
That we will.


_______________________________________________
linux-nvme mailing list
linux-nvme@xxxxxxxxxxxxxxxxxxx
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=HeBnGkcBM5OqESkW8yYYi2KtvVwbdamrbd_X5PgGKBk&e=






[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux