On 2020/03/11 15:25, Christoph Hellwig wrote: > On Wed, Mar 11, 2020 at 12:37:33AM +0000, Damien Le Moal wrote: >> I do not think we can get rid of it entirely as it is needed for applications >> using regular writes on raw zoned block devices. But the zone write locking will >> be completely bypassed for zone append writes issued by file systems. > > But applications that are aware of zones should not be sending multiple > write commands to a zone anyway. We certainly can't use zone write > locking for nvme if we want to be able to use multiple queues. > True, and that is the main use case I am seeing in the field. However, even for this to work properly, we will also need to have a special bio_add_page() function for regular writes to zones, similarly to zone append, to ensure that a large BIO does not become multiple requests, won't we ? Otherwise, a write bio submit will generate multiple requests that may get reordered on dispatch and on requeue (on SAS or on SATA). Furthermore, we already have aio supported. Customers in the field use that with fio libaio engine to test drives and for applications development. So I am afraid that removing the zone write locking now would break user space, no ? For nvme, we want to allow the "none" elevator as the default rather than mq-deadline which is now the default for all zoned block devices. This is a very simple change to the default elevator selection we can add based on the nonrot queue flag. -- Damien Le Moal Western Digital Research