Re: [PATCH 5/5] nvme: support for zoned namespaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/06/2020 21.09, Javier González wrote:
On 17.06.2020 18:55, Matias Bjorling wrote:
-----Original Message-----
From: Javier González <javier@xxxxxxxxxxx>
Sent: Wednesday, 17 June 2020 20.29
To: Matias Bjørling <mb@xxxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>; Keith Busch <Keith.Busch@xxxxxxx>;
linux-nvme@xxxxxxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx; Damien Le Moal
<Damien.LeMoal@xxxxxxx>; Matias Bjorling <Matias.Bjorling@xxxxxxx>;
Sagi Grimberg <sagi@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxxxxx>; Hans
Holmberg <Hans.Holmberg@xxxxxxx>; Dmitry Fomichev
<Dmitry.Fomichev@xxxxxxx>; Ajay Joshi <Ajay.Joshi@xxxxxxx>; Aravind
Ramesh <Aravind.Ramesh@xxxxxxx>; Niklas Cassel
<Niklas.Cassel@xxxxxxx>; Judy Brock <judy.brock@xxxxxxxxxxx>
Subject: Re: [PATCH 5/5] nvme: support for zoned namespaces

On 17.06.2020 19:57, Matias Bjørling wrote:
>On 17/06/2020 16.42, Javier González wrote:
>>On 17.06.2020 09:43, Christoph Hellwig wrote:
>>>On Tue, Jun 16, 2020 at 12:41:42PM +0200, Javier González wrote:
>>>>On 16.06.2020 08:34, Keith Busch wrote:
>>>>>Add support for NVM Express Zoned Namespaces (ZNS) Command Set
>>>>>defined in NVM Express TP4053. Zoned namespaces are discovered
>>>>>based on their Command Set Identifier reported in the namespaces
>>>>>Namespace Identification Descriptor list. A successfully discovered
>>>>>Zoned Namespace will be registered with the block layer as a host
>>>>>managed zoned block device with Zone Append command support. A
>>>>>namespace that does not support append is not supported by the driver.
>>>>
>>>>Why are we enforcing the append command? Append is optional on the
>>>>current ZNS specification, so we should not make this mandatory in
>>>>the implementation. See specifics below.
>>>
>>>Because Append is the way to go and we've moved the Linux zoned block
>>>I/O stack to required it, as should have been obvious to anyone
>>>following linux-block in the last few months.  I also have to say I'm
>>>really tired of the stupid politics tha your company started in the
>>>NVMe working group, and will say that these do not matter for Linux
>>>development at all.  If you think it is worthwhile to support devices
>>>without Zone Append you can contribute support for them on top of
>>>this series by porting the SCSI Zone Append Emulation code to NVMe.
>>>
>>>And I'm not even going to read the rest of this thread as I'm on a
>>>vacation that I badly needed because of the Samsung TWG bullshit.
>>
>>My intention is to support some Samsung ZNS devices that will not
>>enable append. I do not think this is an unreasonable thing to do. How
>>/ why append ended up being an optional feature in the ZNS TP is
>>orthogonal to this conversation. Bullshit or not, it ends up on
>>devices that we would like to support one way or another.
>
>I do not believe any of us have said that it is unreasonable to
>support. We've only asked that you make the patches for it.
>
>All of us have communicated why Zone Append is a great addition to the
>Linux kernel. Also, as Christoph points out, this has not been a secret
>for the past couple of months, and as Martin pointed out, have been a
>wanted feature for the past decade in the Linux community.

>
>I do want to politely point out, that you've got a very clear signal
>from the key storage maintainers. Each of them is part of the planet's
>best of the best and most well-respected software developers, that
>literally have built the storage stack that most of the world depends
>on. The storage stack that recently sent manned rockets into space.
>They each unanimously said that the Zone Append command is the right
>approach for the Linux kernel to reduce the overhead of I/O tracking
>for zoned block devices. It may be worth bringing this information to
>your engineering organization, and also potentially consider Zone
>Append support for devices that you intend to used with the Linux
>kernel storage stack.

I understand and I have never said the opposite.

Append is a great addition that

One may have interpreted your SDC EMEA talk the opposite. It was not
very neutral towards Zone Append, but that is of cause one of its least
problems. But I am happy to hear that you've changed your opinion.

As you are well aware, there are some cases where append introduces
challenges. This is well-documented on the bibliography around nameless
writes.

The nameless writes idea is vastly different from Zone append, and have little of the drawbacks of nameless writes, which makes the well-documented literature not apply.

Part of the talk was on presenting an alternative for these
particular use cases.

This said, I am not afraid of changing my point of view when I am proven
wrong.


we also have been working on for several months (see patches additions from today). We just have a couple of use cases where append is not required and I
would like to make sure that they are supported.

At the end of the day, the only thing I have disagreed on is that the NVMe driver rejects ZNS SSDs that do not support append, as opposed to doing this instead when an in-kernel user wants to utilize the drive (e.g., formatting a FS
with zoned support) This would allow _today_
ioctl() passthru to work for normal writes.

I still believe the above would be a more inclusive solution with the current ZNS
specification, but I can see that the general consensus is different.

The comment from the community, including me, is that there is a
general requirement for Zone Append command when utilizing Zoned
storage devices. This is similar to implement an API that one wants to
support. It is not a general consensus or opinion. It is hard facts and
how the Linux kernel source code is implemented at this point. One must
implement support for ZNS SSDs that do not expose the Zone Append
command natively. Period.

Again, I am not saying the opposite. Read the 2 lines below...

My point with the above paragraph was to clarify that we are not trying to be difficult or opinionated, but point out that the reason we give you the specific feedback, is that it is the way it is in the kernel as today.



So we will go back, apply the feedback that we got and return with an
approach that better fits the ecosystem.

>
>Another approach, is to use SPDK, and bypass the Linux kernel. This
>might even be an advantage, your customers does not have to wait on the >Linux distribution being released with a long term release, before they >can even get started and deploy in volume. I.e., they will actually get
>faster to market, and your company will be able to sell more drives.

I think I will refrain from discussing our business strategy on an open mailing
list. Appreciate the feedback though. Very insightful.

I am not asking for you to discuss your business strategy on the mailing list. My comment was to give you genuinely advise that may save a lot of work, and might even get better results.


Thanks,
Javier





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux