On Wed, Nov 25, 2020 at 05:17:07PM +0800, JeffleXu wrote: > > > On 11/25/20 4:29 PM, Ming Lei wrote: > > On Wed, Nov 25, 2020 at 02:41:47PM +0800, Jeffle Xu wrote: > >> iopoll is initially for small size, latency sensitive IO. It doesn't > >> work well for big IO, especially when it needs to be split to multiple > >> bios. In this case, the returned cookie of __submit_bio_noacct_mq() is > >> indeed the cookie of the last split bio. The completion of *this* last > >> split bio done by iopoll doesn't mean the whole original bio has > >> completed. Callers of iopoll still need to wait for completion of other > >> split bios. > >> > >> Besides bio splitting may cause more trouble for iopoll which isn't > >> supposed to be used in case of big IO. > >> > >> iopoll for split bio may cause potential race if CPU migration happens > >> during bio submission. Since the returned cookie is that of the last > >> split bio, polling on the corresponding hardware queue doesn't help > >> complete other split bios, if these split bios are enqueued into > >> different hardware queues. Since interrupts are disabled for polling > >> queues, the completion of these other split bios depends on timeout > >> mechanism, thus causing a potential hang. > >> > >> iopoll for split bio may also cause hang for sync polling. Currently > >> both the blkdev and iomap-based fs (ext4/xfs, etc) support sync polling > >> in direct IO routine. These routines will submit bio without REQ_NOWAIT > >> flag set, and then start sync polling in current process context. The > >> process may hang in blk_mq_get_tag() if the submitted bio has to be > >> split into multiple bios and can rapidly exhaust the queue depth. The > >> process are waiting for the completion of the previously allocated > >> requests, which should be reaped by the following polling, and thus > >> causing a deadlock. > >> > >> To avoid these subtle trouble described above, just disable iopoll for > >> split bio. > >> > >> Suggested-by: Ming Lei <ming.lei@xxxxxxxxxx> > >> Signed-off-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> > >> Reviewed-by: Christoph Hellwig <hch@xxxxxx> > >> --- > >> block/bio.c | 2 ++ > >> block/blk-merge.c | 12 ++++++++++++ > >> block/blk-mq.c | 3 +++ > >> include/linux/blk_types.h | 1 + > >> 4 files changed, 18 insertions(+) > >> > >> diff --git a/block/bio.c b/block/bio.c > >> index fa01bef35bb1..7f7ddc22a30d 100644 > >> --- a/block/bio.c > >> +++ b/block/bio.c > >> @@ -684,6 +684,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) > >> bio_set_flag(bio, BIO_CLONED); > >> if (bio_flagged(bio_src, BIO_THROTTLED)) > >> bio_set_flag(bio, BIO_THROTTLED); > >> + if (bio_flagged(bio_src, BIO_SPLIT)) > >> + bio_set_flag(bio, BIO_SPLIT); > >> bio->bi_opf = bio_src->bi_opf; > >> bio->bi_ioprio = bio_src->bi_ioprio; > >> bio->bi_write_hint = bio_src->bi_write_hint; > >> diff --git a/block/blk-merge.c b/block/blk-merge.c > >> index bcf5e4580603..a2890cebf99f 100644 > >> --- a/block/blk-merge.c > >> +++ b/block/blk-merge.c > >> @@ -279,6 +279,18 @@ static struct bio *blk_bio_segment_split(struct request_queue *q, > >> return NULL; > >> split: > >> *segs = nsegs; > >> + > >> + /* > >> + * Bio splitting may cause subtle trouble such as hang when doing sync > >> + * iopoll in direct IO routine. Given performance gain of iopoll for > >> + * big IO can be trival, disable iopoll when split needed. We need > >> + * BIO_SPLIT to identify bios need this workaround. Since currently > >> + * only normal IO under mq routine may suffer this issue, BIO_SPLIT is > >> + * only marked here. > >> + */ > >> + bio->bi_opf &= ~REQ_HIPRI; > >> + bio_set_flag(bio, BIO_SPLIT); > > > > You may need to put the above into one helper, and call the helper for > > other splitted cases(discard, write zero and write same) too. > It could be, though currently only normal IO could be marked with REQ_HIPRI. You are right, so far RWF_HIPRI is only applied on RW IO, then no need to do that for other non-RW split. Thanks, Ming