On 11/25/20 4:19 PM, Ming Lei wrote: > On Wed, Nov 25, 2020 at 04:05:10PM +0800, JeffleXu wrote: >> >> >> On 11/25/20 3:19 PM, Ming Lei wrote: >>> On Wed, Nov 25, 2020 at 02:41:47PM +0800, Jeffle Xu wrote: >>>> iopoll is initially for small size, latency sensitive IO. It doesn't >>>> work well for big IO, especially when it needs to be split to multiple >>>> bios. In this case, the returned cookie of __submit_bio_noacct_mq() is >>>> indeed the cookie of the last split bio. The completion of *this* last >>>> split bio done by iopoll doesn't mean the whole original bio has >>>> completed. Callers of iopoll still need to wait for completion of other >>>> split bios. >>>> >>>> Besides bio splitting may cause more trouble for iopoll which isn't >>>> supposed to be used in case of big IO. >>>> >>>> iopoll for split bio may cause potential race if CPU migration happens >>>> during bio submission. Since the returned cookie is that of the last >>>> split bio, polling on the corresponding hardware queue doesn't help >>>> complete other split bios, if these split bios are enqueued into >>>> different hardware queues. Since interrupts are disabled for polling >>>> queues, the completion of these other split bios depends on timeout >>>> mechanism, thus causing a potential hang. >>>> >>>> iopoll for split bio may also cause hang for sync polling. Currently >>>> both the blkdev and iomap-based fs (ext4/xfs, etc) support sync polling >>>> in direct IO routine. These routines will submit bio without REQ_NOWAIT >>>> flag set, and then start sync polling in current process context. The >>>> process may hang in blk_mq_get_tag() if the submitted bio has to be >>>> split into multiple bios and can rapidly exhaust the queue depth. The >>>> process are waiting for the completion of the previously allocated >>>> requests, which should be reaped by the following polling, and thus >>>> causing a deadlock. >>>> >>>> To avoid these subtle trouble described above, just disable iopoll for >>>> split bio. >>>> >>>> Suggested-by: Ming Lei <ming.lei@xxxxxxxxxx> >>>> Signed-off-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> >>>> Reviewed-by: Christoph Hellwig <hch@xxxxxx> >>>> --- >>>> block/bio.c | 2 ++ >>>> block/blk-merge.c | 12 ++++++++++++ >>>> block/blk-mq.c | 3 +++ >>>> include/linux/blk_types.h | 1 + >>>> 4 files changed, 18 insertions(+) >>>> >>>> diff --git a/block/bio.c b/block/bio.c >>>> index fa01bef35bb1..7f7ddc22a30d 100644 >>>> --- a/block/bio.c >>>> +++ b/block/bio.c >>>> @@ -684,6 +684,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) >>>> bio_set_flag(bio, BIO_CLONED); >>>> if (bio_flagged(bio_src, BIO_THROTTLED)) >>>> bio_set_flag(bio, BIO_THROTTLED); >>>> + if (bio_flagged(bio_src, BIO_SPLIT)) >>>> + bio_set_flag(bio, BIO_SPLIT); >>>> bio->bi_opf = bio_src->bi_opf; >>>> bio->bi_ioprio = bio_src->bi_ioprio; >>>> bio->bi_write_hint = bio_src->bi_write_hint; >>>> diff --git a/block/blk-merge.c b/block/blk-merge.c >>>> index bcf5e4580603..a2890cebf99f 100644 >>>> --- a/block/blk-merge.c >>>> +++ b/block/blk-merge.c >>>> @@ -279,6 +279,18 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, >>>> return NULL; >>>> split: >>>> *segs = nsegs; >>>> + >>>> + /* >>>> + * Bio splitting may cause subtle trouble such as hang when doing sync >>>> + * iopoll in direct IO routine. Given performance gain of iopoll for >>>> + * big IO can be trival, disable iopoll when split needed. We need >>>> + * BIO_SPLIT to identify bios need this workaround. Since currently >>>> + * only normal IO under mq routine may suffer this issue, BIO_SPLIT is >>>> + * only marked here. >>>> + */ >>>> + bio->bi_opf &= ~REQ_HIPRI; >>>> + bio_set_flag(bio, BIO_SPLIT); >>>> + >>>> return bio_split(bio, sectors, GFP_NOIO, bs); >>>> } >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 55bcee5dc032..ce1f3628e4c2 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -2265,6 +2265,9 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio) >>>> blk_mq_sched_insert_request(rq, false, true, true); >>>> } >>>> >>>> + if (bio_flagged(bio, BIO_SPLIT)) >>>> + return BLK_QC_T_NONE; >>>> + >>> >>> Not sure the new bio flag is really required for this case, just wondering >>> why not take the following simple way? BTW we are really going to run >>> out of bio flag. >>> >> >> Please consider the following case: >> >> One big bio got split into two split bios. At the first call of >> blk_mq_submit_bio(), the input @bio (actually the original big bio) >> indeed gets split. The split bio gets enqueued to hw queue and the >> returned cookie is BLK_QC_T_NONE, while the remained bio gets buffered >> in bio_list. So far so good. > > When this bio gets splitted, REQ_HIPRI is cleared for this bio, and > all splitted bios won't set this flag too. > That's true, at least in terms of my patch. >> >> Then when calling blk_mq_submit_bio() the second time, the input @bio is >> indeed the remained bio. At this time, it will not get split and you >> will get a *valid* cookie. And since the cookie of last split bio will >> actually overrides the previous cookie, you will get a *valid* cookie as >> a result. > > Then valid cookie can be returned only for bio with REQ_HIPRI. Sounds reasonable. The side effect is that original IO without REQ_HIPRI now also returns BLK_QC_T_NONE. But it may not be a big problem, since the returned cookie is not used at all for IO without REQ_HIPRI. -- Thanks, Jeffle