RE: [LSF/MM/BPF BoF] BoF for Zoned Storage

Matias Bjørling <Matias.Bjorling@xxxxxxx> · Thu, 3 Mar 2022 21:33:06 +0000

> -----Original Message-----
> From: Adam Manzanares <a.manzanares@xxxxxxxxxxx>
> Sent: Thursday, 3 March 2022 21.19
> To: Matias Bjørling <Matias.Bjorling@xxxxxxx>
> Cc: Damien Le Moal <Damien.LeMoal@xxxxxxx>; Javier González
> <javier@xxxxxxxxxxx>; Luis Chamberlain <mcgrof@xxxxxxxxxx>; linux-
> block@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; lsf-pc@lists.linux-
> foundation.org; Bart Van Assche <bvanassche@xxxxxxx>; Keith Busch
> <Keith.Busch@xxxxxxx>; Johannes Thumshirn
> <Johannes.Thumshirn@xxxxxxx>; Naohiro Aota <Naohiro.Aota@xxxxxxx>;
> Pankaj Raghav <pankydev8@xxxxxxxxx>; Kanchan Joshi
> <joshi.k@xxxxxxxxxxx>; Nitesh Shetty <nj.shetty@xxxxxxxxxxx>
> Subject: Re: [LSF/MM/BPF BoF] BoF for Zoned Storage
> 
> On Thu, Mar 03, 2022 at 07:51:36PM +0000, Matias Bjørling wrote:
> > > Sounds like you voluntered to teach zoned storage use 101. Can you
> > > teach me how to calculate an LBA offset given a zone number when
> > > zone capacity is not equal to zone size?
> >
> > zonesize_pow = x; // e.g., x = 32 if 2GiB Zone size /w 512B block size
> > zone_id = y; // valid zone id
> >
> > struct blk_zone zone = zones[zone_id]; // zones is a linear array of blk_zone
> structs that holds per zone information.
> >
> > With that, one can do the following
> > 1a) first_lba_of_zone =  zone_id << zonesize_pow;
> > 1b) first_lba_of_zone = zone.start;
> 
> 1b is interesting. What happens if i don't have struct blk_zone and zone size is
> not equal to zone capacity?

struct blk_zone could be what one likes it to be. It is just a data structure that captures key information about a zone. A zone's start address is orthogonal to a zone's writeable capacity.

> 
> > 2a) next_writeable_lba = (zoneid << zonesize_pow) + zone.wp;
> > 2b) next_writeable_lba = zone.start + zone.wp;
> 
> Can we modify 2b to not use zone.start?

Yes - use 2a.

> 
> > 3)   writeable_lbas_left = zone.len - zone.wp;
> > 4)   lbas_written = zone.wp - 1;
> >
> > > The second thing I would like to know is what happens when an
> > > application wants to map an object that spans multiple consecutive
> > > zones. Does the application have to be aware of the difference in zone
> capacity and zone size?
> >
> > The zoned namespace command set specification does not allow variable
> zone size. The zone size is fixed for all zones in a namespace. Only the zone
> capacity has the capability to be variable. Usually, the zone capacity is fixed, I
> have not yet seen implementations that have variable zone capacities.
> >
> 
> IDK where variable zone size came from. I am talking about the fact that the
> zone size does not have to equal zone capacity.

Ok. Yes, an application should be aware how its managing a zone - similar to that it has to have logic that knows that a zone must be reset.

> 
> > An application that wants to place a single object across a set of zones would
> have to be explicitly handled by the application. E.g., as well as the application,
> should be aware of a zone's capacity, it should also be aware that it should
> reset the set of zones and not a single zone. I.e., the application must always be
> aware of the zones it uses.
> >
> > However, an end-user application should not (in my opinion) have to deal
> with this. It should use helper functions from a library that provides the
> appropriate abstraction to the application, such that the applications don't
> have to care about either specific zone capacity/size, or multiple resets. This is
> similar to how file systems work with file system semantics. For example, a file
> can span multiple extents on disk, but all an application sees is the file
> semantics.
> >
> 
> I don't want to go so far as to say what the end user application should and
> should not do.

Consider it as a best practice example. Another typical example is that one should avoid extensive flushes to disk if the application doesn't need persistence for each I/O it issues.