On 2020/04/18 10:01, Theodore Y. Ts'o wrote: > On Fri, Apr 17, 2020 at 05:48:20PM +0000, Johannes Thumshirn wrote: >> For "userspace's responsibility", I'd re-phrase this as "a consumer's >> responsibility", as we don't have an interface which aims at user-space >> yet. The only consumer this series implements is zonefs, although we did >> have an AIO implementation for early testing and io_uring shouldn't be >> too hard to implement. > > Ah, I had assumed that userspace interface exposed would be opening > the block device with the O_APPEND flag. (Which raises interesting > questions if the block device is also opened without O_APPEND and some > other thread was writing to the same zone, in which case the order in > which requests are processed would control whether the I/O would > fail.) O_APPEND has no effect for raw block device files since the file size is always 0. While we did use this flag initially for quick tests of user space interface, it was a hack. Any proper implementation of a user space interface will probably need a new RWF_ flag that can be passed to aios (io_submit() and io_uring) and preadv2()/pwritev2() calls. As for the case of one application doing regular writes and another doing zone append writes to the same zone, you are correct, there will be errors. But not for the zone append writes: they will all succeed since by definition, these do not need the current zone write pointer and always append at the zone current wp, wherever it is (with the zone not being full that is). Most of the regular writes will likely fail since without synchronization between the applications, the write pointer for the target zone would constantly change under the issuer of the regular writes, even if that issuer uses report zones before any write operation. There is no automatic synchronization in the kernel for this and we do not intend to add any: such bad use case is similar to 2 non-synchronized writers issuing regular writes to the same zone. This cannot work correctly without mutual exclusion in the IOs issuing path and that is the responsibility of the user, be it an application process or an in-kernel component. As Johannes pointed out, once BIOs aare submitted, the kernel does guarantee ordered dispatching of writes per zone with zone write locking (mq-deadline). -- Damien Le Moal Western Digital Research