On 7/9/19 8:29 AM, Ming Lei wrote: > On Tue, Jul 09, 2019 at 06:02:19PM +0900, Damien Le Moal wrote: >> Simultaneously writing to a sequential zone of a zoned block device >> from multiple contexts requires mutual exclusion for BIO issuing to >> ensure that writes happen sequentially. However, even for a well >> behaved user correctly implementing such synchronization, BIO plugging >> may interfere and result in BIOs from the different contextx to be >> reordered if plugging is done outside of the mutual exclusion section, >> e.g. the plug was started by a function higher in the call chain than >> the function issuing BIOs. >> >> Context A Context B >> >> | blk_start_plug() >> | ... >> | seq_write_zone() >> | mutex_lock(zone) >> | submit_bio(bio-0) >> | submit_bio(bio-1) >> | mutex_unlock(zone) >> | return >> | ------------------------------> | seq_write_zone() >> | mutex_lock(zone) >> | submit_bio(bio-2) >> | mutex_unlock(zone) >> | <------------------------------ | >> | blk_finish_plug() >> >> In the above example, despite the mutex synchronization resulting in the >> correct BIO issuing order 0, 1, 2, context A BIOs 0 and 1 end up being >> issued after BIO 2 when the plug is released with blk_finish_plug(). > > I am wondering how you guarantee that context B is always run after > context A. > >> >> To fix this problem, introduce the internal helper function >> blk_mq_plug() to access the current context plug, return the current >> plug only if the target device is not a zoned block device or if the >> BIO to be plugged not a write operation. Otherwise, ignore the plug and >> return NULL, resulting is all writes to zoned block device to never be >> plugged. > > Another candidate approach is to run the following code before > releasing 'zone' lock: > > if (current->plug) > blk_finish_plug(context->plug) > > Then we can fix zone specific issue in zone code only, and avoid generic > blk-core change for zone issue. I prefer that to the existing solution as well. -- Jens Axboe