Re: [LSF/MM/BPF BoF] BoF for Zoned Storage

Javier González <javier@xxxxxxxxxxx> · Thu, 3 Mar 2022 22:08:47 +0100

> On 3 Mar 2022, at 21.18, Adam Manzanares <a.manzanares@xxxxxxxxxxx> wrote:
> 
> On Thu, Mar 03, 2022 at 07:51:36PM +0000, Matias Bjørling wrote:
>>> Sounds like you voluntered to teach zoned storage use 101. Can you teach me
>>> how to calculate an LBA offset given a zone number when zone capacity is not
>>> equal to zone size?
>> 
>> zonesize_pow = x; // e.g., x = 32 if 2GiB Zone size /w 512B block size
>> zone_id = y; // valid zone id
>> 
>> struct blk_zone zone = zones[zone_id]; // zones is a linear array of blk_zone structs that holds per zone information.
>> 
>> With that, one can do the following
>> 1a) first_lba_of_zone =  zone_id << zonesize_pow;
>> 1b) first_lba_of_zone = zone.start;
> 
> 1b is interesting. What happens if i don't have struct blk_zone and zone size 
> is not equal to zone capacity?
> 
>> 2a) next_writeable_lba = (zoneid << zonesize_pow) + zone.wp;
>> 2b) next_writeable_lba = zone.start + zone.wp;
> 
> Can we modify 2b to not use zone.start?
> 
>> 3)   writeable_lbas_left = zone.len - zone.wp;
>> 4)   lbas_written = zone.wp - 1;
>> 
>>> The second thing I would like to know is what happens when an application
>>> wants to map an object that spans multiple consecutive zones. Does the
>>> application have to be aware of the difference in zone capacity and zone size?
>> 
>> The zoned namespace command set specification does not allow variable zone size. The zone size is fixed for all zones in a namespace. Only the zone capacity has the capability to be variable. Usually, the zone capacity is fixed, I have not yet seen implementations that have variable zone capacities.
>> 
> 
> IDK where variable zone size came from. I am talking about the fact that the 
> zone size does not have to equal zone capacity. 
> 
>> An application that wants to place a single object across a set of zones would have to be explicitly handled by the application. E.g., as well as the application, should be aware of a zone's capacity, it should also be aware that it should reset the set of zones and not a single zone. I.e., the application must always be aware of the zones it uses.
>> 
>> However, an end-user application should not (in my opinion) have to deal with this. It should use helper functions from a library that provides the appropriate abstraction to the application, such that the applications don't have to care about either specific zone capacity/size, or multiple resets. This is similar to how file systems work with file system semantics. For example, a file can span multiple extents on disk, but all an application sees is the file semantics. 
>> 
> 
> I don't want to go so far as to say what the end user application should and 
> should not do.

Adam, Matias, Damien,

Trying to bring us back to the original proposal. 

I believe we all can agree that applications and file-systems that work in objects / extents / segments of PO2 can benefit from defining the zone boundary at a PO2. Based on the code I have seen so far, these applications will still have to deal with the zone capacity. So if an application of FS needs to align to a certain size, it is the capacity that will have to be considered. Since there are plenty users, I am sure there are examples where this does not apply. 

In my view, the point to remove this constraint is that there are users that can deal with !PO2 zone sizes and imposing the unmapped LBAs for them is creating unnecessary hassle. This hurts the zoned ecosystem and therefore adoption. 

Even when we remove PO2 zone sizes, devices exposing PO2 zone sizes will of course be supported, and probably preferred for the use-cases that make sense. 

As we start to post patches, I hope these points become more clear.