Re: [PATCH 00/11] Introduce Zone Append for writing to zoned block devices

Damien Le Moal <Damien.LeMoal@xxxxxxx> · Wed, 11 Mar 2020 06:40:20 +0000

On 2020/03/11 15:25, Christoph Hellwig wrote:
> On Wed, Mar 11, 2020 at 12:37:33AM +0000, Damien Le Moal wrote:
>> I do not think we can get rid of it entirely as it is needed for applications
>> using regular writes on raw zoned block devices. But the zone write locking will
>> be completely bypassed for zone append writes issued by file systems.
> 
> But applications that are aware of zones should not be sending multiple
> write commands to a zone anyway.  We certainly can't use zone write
> locking for nvme if we want to be able to use multiple queues.
> 

True, and that is the main use case I am seeing in the field.

However, even for this to work properly, we will also need to have a special
bio_add_page() function for regular writes to zones, similarly to zone append,
to ensure that a large BIO does not become multiple requests, won't we ?
Otherwise, a write bio submit will generate multiple requests that may get
reordered on dispatch and on requeue (on SAS or on SATA).

Furthermore, we already have aio supported. Customers in the field use that with
fio libaio engine to test drives and for applications development. So I am
afraid that removing the zone write locking now would break user space, no ?

For nvme, we want to allow the "none" elevator as the default rather than
mq-deadline which is now the default for all zoned block devices. This is a very
simple change to the default elevator selection we can add based on the nonrot
queue flag.

-- 
Damien Le Moal
Western Digital Research