Re: [PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Wed, 16 Mar 2022 09:00:27 +0900

On 3/15/22 22:05, Javier González wrote:
>>> The main constraint for (1) PO2 is removed in the block layer, we
>>> have (2) Linux hosts stating that unmapped LBAs are a problem,
>>> and we have (3) HW supporting size=capacity.
>>> 
>>> I would be happy to hear what else you would like to see for this
>>> to be of use to the kernel community.
>> 
>> (Added numbers to your paragraph above)
>> 
>> 1. The sysfs chunksize attribute was "misused" to also represent
>> zone size. What has changed is that RAID controllers now can use a
>> NPO2 chunk size. This wasn't meant to naturally extend to zones,
>> which as shown in the current posted patchset, is a lot more work.
> 
> True. But this was the main constraint for PO2.

And as I said, users asked for it.

>> 2. Bo mentioned that the software already manages holes. It took a
>> bit of time to get right, but now it works. Thus, the software in
>> question is already capable of working with holes. Thus, fixing
>> this, would present itself as a minor optimization overall. I'm not
>> convinced the work to do this in the kernel is proportional to the
>> change it'll make to the applications.
> 
> I will let Bo response himself to this.
> 
>> 3. I'm happy to hear that. However, I'll like to reiterate the
>> point that the PO2 requirement have been known for years. That
>> there's a drive doing NPO2 zones is great, but a decision was made
>> by the SSD implementors to not support the Linux kernel given its
>> current implementation.
> 
> Zone devices has been supported for years in SMR, and I this is a
> strong argument. However, ZNS is still very new and customers have
> several requirements. I do not believe that a HDD stack should have
> such an impact in NVMe.
> 
> Also, we will see new interfaces adding support for zoned devices in
> the future.
> 
> We should think about the future and not the past.

Backward compatibility ? We must not break userspace...

>> 
>> All that said - if there are people willing to do the work and it
>> doesn't have a negative impact on performance, code quality,
>> maintenance complexity, etc. then there isn't anything saying
>> support can't be added - but it does seem like it’s a lot of work,
>> for little overall benefits to applications and the host users.
> 
> Exactly.
> 
> Patches in the block layer are trivial. This is running in
> production loads without issues. I have tried to highlight the
> benefits in previous benefits and I believe you understand them.

The block layer is not the issue here. We all understand that one is easy.

> Support for ZoneFS seems easy too. We have an early POC for btrfs and
> it seems it can be done. We sign up for these 2.

zonefs can trivially support non power of 2 zone sizes, but as zonefs
creates a discrete view of the device capacity with its one file per
zone interface, an application accesses to a zone are forcibly limited
to that zone, as they should. With zonefs, pow2 and nonpow2 devices will
show the *same* interface to the application. Non power of 2 zone size
then have absolutely no benefits at all.

> As for F2FS and dm-zoned, I do not think these are targets at the 
> moment. If this is the path we follow, these will bail out at mkfs
> time.

And what makes you think that this is acceptable ? What guarantees do
you have that this will not be a problem for users out there ?

-- 
Damien Le Moal
Western Digital Research