On Tue, Jul 09, 2019 at 06:02:19PM +0900, Damien Le Moal wrote: > Simultaneously writing to a sequential zone of a zoned block device > from multiple contexts requires mutual exclusion for BIO issuing to > ensure that writes happen sequentially. However, even for a well > behaved user correctly implementing such synchronization, BIO plugging > may interfere and result in BIOs from the different contextx to be > reordered if plugging is done outside of the mutual exclusion section, > e.g. the plug was started by a function higher in the call chain than > the function issuing BIOs. > > Context A Context B > > | blk_start_plug() > | ... > | seq_write_zone() > | mutex_lock(zone) > | submit_bio(bio-0) > | submit_bio(bio-1) > | mutex_unlock(zone) > | return > | ------------------------------> | seq_write_zone() > | mutex_lock(zone) > | submit_bio(bio-2) > | mutex_unlock(zone) > | <------------------------------ | > | blk_finish_plug() > > In the above example, despite the mutex synchronization resulting in the > correct BIO issuing order 0, 1, 2, context A BIOs 0 and 1 end up being > issued after BIO 2 when the plug is released with blk_finish_plug(). I am wondering how you guarantee that context B is always run after context A. > > To fix this problem, introduce the internal helper function > blk_mq_plug() to access the current context plug, return the current > plug only if the target device is not a zoned block device or if the > BIO to be plugged not a write operation. Otherwise, ignore the plug and > return NULL, resulting is all writes to zoned block device to never be > plugged. Another candidate approach is to run the following code before releasing 'zone' lock: if (current->plug) blk_finish_plug(context->plug) Then we can fix zone specific issue in zone code only, and avoid generic blk-core change for zone issue. Thanks, Ming