Re: [PATCH 3/4] ioengines: add get_max_open_zones zoned block device operation

Niklas Cassel <Niklas.Cassel@xxxxxxx> · Mon, 17 May 2021 08:43:24 +0000

On Sat, May 15, 2021 at 01:16:00AM +0000, Damien Le Moal wrote:
> On Fri, 2021-05-14 at 12:05 +0000, Niklas Cassel wrote:
> > On Thu, May 13, 2021 at 12:23:59AM +0000, Damien Le Moal wrote:
> > > On 2021/05/13 7:37, Niklas Cassel wrote:
> > > > From: Niklas Cassel <niklas.cassel@xxxxxxx>
> > > > 
> > > > Define a new IO engine operation to get the maximum number of open zones.
> > > > Like the existing IO engine operations: .get_zoned_model, .report_zones,
> > > > and .reset_wp, this new IO engine operation is only valid for zoned block
> > > > devices.
> > > > 
> > > > Similarly to the other zbd IO engine operations, also provide a default
> > > > implementation inside oslib/linux-blkzoned.c that will be used if the
> > > > ioengine does not override it.
> > > > 
> > > > The default Linux oslib implementation is implemented similarly to
> > > > blkzoned_get_zoned_model(), i.e. it will return a successful error code
> > > > even when the sysfs attribute does not exist.
> > > > This is because the sysfs max_open_zones attribute was introduced first
> > > > in Linux v5.9.
> > > > All error handling is still there, so an ioengine that provides its own
> > > > implementation will still have its error code respected properly.
> > > > 
> > > > Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
> > > > ---

(snip)

> That said, there is a refinement needed I think, which is to ignore the drive
> advertised max_open_zones if max_active_zones is 0.
> 
> The reason is that for SMR drives, the max_open_zones limit is only meaningful
> in the context of explicit zone open which fio does not do. For implicit zone
> open as used in fio, there will be no IO error for a write workload that
> simultaneously writes to more than max_open_zones since max_active_zones is
> always 0 (no limit) with SMR.
> 
> Having the ability to run workloads that write to more than max_open_zones is
> useful to measure the potential impact on performance of the drive implicit
> zone close & implicit zone open triggered by such workload.
> 
> So I would suggest we change to something like this:
> 
> 	if (!td->o.max_open_zones && f->zbd_info->max_active_zones) {
> 		/* User did not request a limit. Set limit to max supported
> */		zbd->max_open_zones = max_open_zones;
> 	} else if (td->o.max_open_zones < max_open_zones) {
> 		/* User requested a limit, limit is not too large */
> 		zbd->max_open_zones = td->o.max_open_zones;
> 	} else if (f->zbd_info->max_active_zones) {
> 		/* User requested a limit, but limit is too large */
> 		...
> 		return -EINVAL;
> 	}
> 
> Thoughts ?

Even on a zoned block device with max active zones == 0, with a max open
zones limit != 0, writing to more zones than supported will, in addition
to the regular I/O, cause implicit open + implicit closed operations.

These operations will of course take time, time that would otherwise be
spent on I/O, meaning that the results you get would not be representative
of a drive's performance.

While I do agree that this test scenario could be benificial, I do think
that it is a very special type of test.

These patches have been merged already, but I don't see why you can't make
a patch that e.g. adds an ioengine that just opens (imp/exp) + closes zones.
(We had a filedelete ioengine merged recently, which tests how quickly
files than be unlinked.)
Or, if it is easier, you could add a new option to zbd.c --ignore_zbd_limits,
so that you can specify an max open zones limit that is greater than what
the drive supports, in order to facilitate your test scenario.

Reasoning for my suggestion:
1) As you know, fio currently has no accounting of active zones.
It seems a bit awkward to parse max active zones from sysfs, when zbd.c
itself currently has no concept of active zones.
So it seems easier to e.g. add a new parameter that just ignores the device
limits, than to implement full support for active zones.

2) You are using detailed knowledge of how fio handles zones.
It does happen that fio currently uses writes without first doing an explicit
open zone, but should you really take that for granted?
If fio adds support for active zones, perhaps that implementation will chose
to do implement it using explicit zone open, so that if the zone could be
opened, it will also be possible to write to that zone without I/O errors.
(As we have implemented in e.g. zonefs.)

Kind regards,
Niklas