Re: [PATCH v6] block: disable iopoll for split bio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 25, 2020 at 05:17:07PM +0800, JeffleXu wrote:
> 
> 
> On 11/25/20 4:29 PM, Ming Lei wrote:
> > On Wed, Nov 25, 2020 at 02:41:47PM +0800, Jeffle Xu wrote:
> >> iopoll is initially for small size, latency sensitive IO. It doesn't
> >> work well for big IO, especially when it needs to be split to multiple
> >> bios. In this case, the returned cookie of __submit_bio_noacct_mq() is
> >> indeed the cookie of the last split bio. The completion of *this* last
> >> split bio done by iopoll doesn't mean the whole original bio has
> >> completed. Callers of iopoll still need to wait for completion of other
> >> split bios.
> >>
> >> Besides bio splitting may cause more trouble for iopoll which isn't
> >> supposed to be used in case of big IO.
> >>
> >> iopoll for split bio may cause potential race if CPU migration happens
> >> during bio submission. Since the returned cookie is that of the last
> >> split bio, polling on the corresponding hardware queue doesn't help
> >> complete other split bios, if these split bios are enqueued into
> >> different hardware queues. Since interrupts are disabled for polling
> >> queues, the completion of these other split bios depends on timeout
> >> mechanism, thus causing a potential hang.
> >>
> >> iopoll for split bio may also cause hang for sync polling. Currently
> >> both the blkdev and iomap-based fs (ext4/xfs, etc) support sync polling
> >> in direct IO routine. These routines will submit bio without REQ_NOWAIT
> >> flag set, and then start sync polling in current process context. The
> >> process may hang in blk_mq_get_tag() if the submitted bio has to be
> >> split into multiple bios and can rapidly exhaust the queue depth. The
> >> process are waiting for the completion of the previously allocated
> >> requests, which should be reaped by the following polling, and thus
> >> causing a deadlock.
> >>
> >> To avoid these subtle trouble described above, just disable iopoll for
> >> split bio.
> >>
> >> Suggested-by: Ming Lei <ming.lei@xxxxxxxxxx>
> >> Signed-off-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx>
> >> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> >> ---
> >>  block/bio.c               |  2 ++
> >>  block/blk-merge.c         | 12 ++++++++++++
> >>  block/blk-mq.c            |  3 +++
> >>  include/linux/blk_types.h |  1 +
> >>  4 files changed, 18 insertions(+)
> >>
> >> diff --git a/block/bio.c b/block/bio.c
> >> index fa01bef35bb1..7f7ddc22a30d 100644
> >> --- a/block/bio.c
> >> +++ b/block/bio.c
> >> @@ -684,6 +684,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
> >>  	bio_set_flag(bio, BIO_CLONED);
> >>  	if (bio_flagged(bio_src, BIO_THROTTLED))
> >>  		bio_set_flag(bio, BIO_THROTTLED);
> >> +	if (bio_flagged(bio_src, BIO_SPLIT))
> >> +		bio_set_flag(bio, BIO_SPLIT);
> >>  	bio->bi_opf = bio_src->bi_opf;
> >>  	bio->bi_ioprio = bio_src->bi_ioprio;
> >>  	bio->bi_write_hint = bio_src->bi_write_hint;
> >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> index bcf5e4580603..a2890cebf99f 100644
> >> --- a/block/blk-merge.c
> >> +++ b/block/blk-merge.c
> >> @@ -279,6 +279,18 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> >>  	return NULL;
> >>  split:
> >>  	*segs = nsegs;
> >> +
> >> +	/*
> >> +	 * Bio splitting may cause subtle trouble such as hang when doing sync
> >> +	 * iopoll in direct IO routine. Given performance gain of iopoll for
> >> +	 * big IO can be trival, disable iopoll when split needed. We need
> >> +	 * BIO_SPLIT to identify bios need this workaround. Since currently
> >> +	 * only normal IO under mq routine may suffer this issue, BIO_SPLIT is
> >> +	 * only marked here.
> >> +	 */
> >> +	bio->bi_opf &= ~REQ_HIPRI;
> >> +	bio_set_flag(bio, BIO_SPLIT);
> > 
> > You may need to put the above into one helper, and call the helper for
> > other splitted cases(discard, write zero and write same) too.
> It could be, though currently only normal IO could be marked with REQ_HIPRI.

You are right, so far RWF_HIPRI is only applied on RW IO, then no need to
do that for other non-RW split.

Thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux