RE: [PATCH 5/5] nvme: support for zoned namespaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



    "The on-going re-work of btrfs zone support for instance now relies 100% on zone append being supported.... So the approach is: mandate zone append support for ZNS devices.... To allow other ZNS drives, an emulation similar to SCSI can be implemented, ...  While on a HDD the  performance penalty is minimal, it will likely be *significant* on a SSD."

Wow. Well as I said, I don't know much about Linux but it sounds like the ongoing re-work of btrfs zone support mandating zone append should be revisited.

The reality is there will be flavors of ZNS drives in the market that do not support Append.  As many of you know, the ZRWA technical proposal is well under-way in NVMe ZNS WG.

Ensuring that the entire Linux zone support ecosystem deliberately locks these devices out / or at best consigns them to a severely performance-penalized path, especially given the MULTIPLE statements that have been made in the NVMe ZNS WG by multiple companies regarding the use cases for which Zone Append is an absolute disaster (not my words), seems pretty darn inappropriate.





-----Original Message-----
From: linux-nvme [mailto:linux-nvme-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Damien Le Moal
Sent: Tuesday, June 16, 2020 5:36 AM
To: Javier González; Matias Bjørling
Cc: Jens Axboe; Niklas Cassel; Ajay Joshi; Sagi Grimberg; Keith Busch; Dmitry Fomichev; Aravind Ramesh; linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; Hans Holmberg; Christoph Hellwig; Matias Bjorling
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces

On 2020/06/16 21:24, Javier González wrote:
> On 16.06.2020 14:06, Matias Bjørling wrote:
>> On 16/06/2020 14.00, Javier González wrote:
>>> On 16.06.2020 13:18, Matias Bjørling wrote:
>>>> On 16/06/2020 12.41, Javier González wrote:
>>>>> On 16.06.2020 08:34, Keith Busch wrote:
>>>>>> Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
>>>>>> in NVM Express TP4053. Zoned namespaces are discovered based on their
>>>>>> Command Set Identifier reported in the namespaces Namespace
>>>>>> Identification Descriptor list. A successfully discovered Zoned
>>>>>> Namespace will be registered with the block layer as a host managed
>>>>>> zoned block device with Zone Append command support. A namespace that
>>>>>> does not support append is not supported by the driver.
>>>>>
>>>>> Why are we enforcing the append command? Append is optional on the
>>>>> current ZNS specification, so we should not make this mandatory in the
>>>>> implementation. See specifics below.
>>>
>>>>
>>>> There is already general support in the kernel for the zone append 
>>>> command. Feel free to submit patches to emulate the support. It is 
>>>> outside the scope of this patchset.
>>>>
>>>
>>> It is fine that the kernel supports append, but the ZNS specification
>>> does not impose the implementation for append, so the driver should not
>>> do that either.
>>>
>>> ZNS SSDs that choose to leave append as a non-implemented optional
>>> command should not rely on emulated SW support, specially when
>>> traditional writes work very fine for a large part of current ZNS use
>>> cases.
>>>
>>> Please, remove this virtual constraint.
>>
>> The Zone Append command is mandatory for zoned block devices. Please 
>> see https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_818709_&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=kkJ8bJpiTjKS9PoobDPHTf11agXKNUpcw5AsIEyewZk&e=  for the background.
> 
> I do not see anywhere in the block layer that append is mandatory for
> zoned devices. Append is emulated on ZBC, but beyond that there is no
> mandatory bits. Please explain.

This is to allow a single write IO path for all types of zoned block device for
higher layers, e.g file systems. The on-going re-work of btrfs zone support for
instance now relies 100% on zone append being supported. That significantly
simplifies the file system support and more importantly remove the need for
locking around block allocation and BIO issuing, allowing to preserve a fully
asynchronous write path that can include workqueues for efficient CPU usage of
things like encryption and compression. Without zone append, file system would
either (1) have to reject these drives that do not support zone append, or (2)
implement 2 different write IO path (slower regular write and zone append). None
of these options are ideal, to say the least.

So the approach is: mandate zone append support for ZNS devices. To allow other
ZNS drives, an emulation similar to SCSI can be implemented, with that emulation
ideally combined to work for both types of drives if possible. And note that
this emulation would require the drive to be operated with mq-deadline to enable
zone write locking for preserving write command order. While on a HDD the
performance penalty is minimal, it will likely be significant on a SSD.

> 
>> Please submitpatches if you want to have support for ZNS devices that
>> does not implement the Zone Append command. It is outside the scope
>> of this patchset.
> 
> That we will.
> 
> 
> _______________________________________________
> linux-nvme mailing list
> linux-nvme@xxxxxxxxxxxxxxxxxxx
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=HeBnGkcBM5OqESkW8yYYi2KtvVwbdamrbd_X5PgGKBk&e= 
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
linux-nvme mailing list
linux-nvme@xxxxxxxxxxxxxxxxxxx
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.infradead.org_mailman_listinfo_linux-2Dnvme&d=DwIFAw&c=JfeWlBa6VbDyTXraMENjy_b_0yKWuqQ4qY-FPhxK4x8w-TfgRBDyeV4hVQQBEgL2&r=YJM_QPk2w1CRIo5NNBXnCXGzNnmIIfG_iTRs6chBf6s&m=-fIHWuFYU2GHiTJ2FuhTBgrypPIJW0FjLUWTaK4cH9c&s=HeBnGkcBM5OqESkW8yYYi2KtvVwbdamrbd_X5PgGKBk&e= 




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux