Re: [PATCH] block: set NOWAIT for sync polling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 14, 2020 at 04:31:49PM +0800, JeffleXu wrote:
> What about just disabling HIPRI in preadv2(2)/pwritev2(2)? Christoph Hellwig
> disabled HIPRI for libaio in

Then people will complain that poll can't be used for sync dio, and it is
an regression.

> 
> commit 154989e45fd8de9bfb52bbd6e5ea763e437e54c5 ("aio: clear IOCB_HIPRI").
> What do you think, @Christoph?
> 
> (cc Christoph Hellwig)
> 
> 
> >   static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb)
> >   {
> > -	bio->bi_opf |= REQ_HIPRI;
> > -	if (!is_sync_kiocb(kiocb))
> > -		bio->bi_opf |= REQ_NOWAIT;
> > +	bio->bi_opf |= REQ_HIPRI | REQ_NOWAIT;
> >   }
> 
> The original patch indeed could not fix the problem. Though it could fix the
> potential deadlock,
> 
> the VFS code read(2)/write(2) is not ready by handling the returned -EAGAIN
> gracefully. Currently
> 
> read(2)/write(2) will just return -EAGAIN to user space.
> 
> 
> 
> On 10/13/20 8:09 PM, Ming Lei wrote:
> > On Tue, Oct 13, 2020 at 04:40:51PM +0800, Jeffle Xu wrote:
> > > Sync polling also needs REQ_NOWAIT flag. One sync read/write may be
> > > split into several bios (and thus several requests), and can used up the
> > > queue depth sometimes. Thus the following bio in the same sync
> > > read/write will wait for usable request if REQ_NOWAIT flag not set, in
> > > which case the following sync polling will cause a deadlock.
> > > 
> > > One case (maybe the only case) for above situation is preadv2/pwritev2
> > > + direct + highpri. Two conditions need to be satisfied to trigger the
> > > deadlock.
> > > 
> > > 1. HIPRI IO in sync routine. Normal read(2)/pread(2)/readv(2)/preadv(2)
> > > and corresponding write family syscalls don't support high-priority IO and
> > > thus won't trigger polling routine. Only preadv2(2)/pwritev2(2) supports
> > > high-priority IO by RWF_HIPRI flag of @flags parameter.
> > > 
> > > 2. Polling support in sync routine. Currently both the blkdev and
> > > iomap-based fs (ext4/xfs, etc) support polling in direct IO routine. The
> > > general routine is described as follows.
> > > 
> > > submit_bio
> > >    wait for blk_mq_get_tag(), waiting for requests completion, which
> > >    should be done by the following polling, thus causing a deadlock.
> > Another blocking point is rq_qos_throttle(),
> 
> What is the issue here in rq_qos_throttle()? More details?

Like allocating request, rq_qos_throttle() may wait until rq completion
is done, see wbt_wait() and wbt_done().

> 
> 
> > so I guess falling back to
> > REQ_NOWAIT may not fix the issue completely.
> 
> 
> 
> > Given iopoll isn't supposed to in case of big IO, another solution
> > may be to disable iopoll when bio splitting is needed, something
> > like the following change:
> > 
> > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > index bcf5e4580603..8e762215660b 100644
> > --- a/block/blk-merge.c
> > +++ b/block/blk-merge.c
> > @@ -279,6 +279,12 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> >   	return NULL;
> >   split:
> >   	*segs = nsegs;
> > +
> > +	/*
> > +	 * bio splitting may cause more trouble for iopoll which isn't supposed
> > +	 * to be used in case of big IO
> > +	 */
> > +	bio->bi_opf &= ~REQ_HIPRI;
> >   	return bio_split(bio, sectors, GFP_NOIO, bs);
> >   }
> 
> Actually split is not only from blk_mq_submit_bio->__blk_queue_split. In
> __blkdev_direct_IO,
> 
> one input iov_iter could be split to several bios.
> 
> ```
> 
> __blkdev_direct_IO:
> 
> for (;;) {
>         ret = bio_iov_iter_get_pages(bio, iter);
>         submit_bio(bio);
> }
> 
> for (;;) {
>         blk_poll()
> 
>         ...
> 
> }
> 
> ```
> 
> Since  one single bio can contain at most BIO_MAX_PAGES, i.e. 256 bio_vec in
> @bio->bi_io_vec,

As Jens mentioned, it is weird to poll on multiple bios, so we can
disable io poll simply in __blkdev_direct_IO(). And it is reasonable from
user's viewpoint to not poll on big bio cause io poll is often used
in latency sensitive cases.

> 
> if the @iovcnt parameter of preadv2(2)/pwritev2(2) is larger than 256, then
> one call of
> 
> preadv2(2)/pwritev2(2) can be split into several bios. These bios are
> submitted at once, and then
> 
> start sync polling in the process context.
> 
> 
> If the number of bios split from one call of preadv2(2)/pwritev2(2) is
> larger than the queue depth,
> 
> bios from single preadv2(2)/pwritev2(2) call can exhaust the queue depth and
> thus cause deadlock.
> 
> Fortunately the maximum of @iovcnt parameter of preadv2(2)/pwritev2(2) is
> UIO_MAXIOV, i.e. 1024,
> 
> and the minimum of queue depth is BLKDEV_MIN_RQ i.e. 4. That means one
> preadv2(2)/pwritev2(2)
> 
> call can submit at most 4 bios, which will fill up the queue depth exactly
> and there's no deadlock in this
> 
> case. I'm not sure if the setting of UIO_MAXIOV/BIO_MAX_PAGES/BLKDEV_MIN_RQ
> is coincident or
> 
> deliberately tuned. At least it will not cause deadlock currently , though
> the constraint may be a little fragile.
> 
> 
> By the way, this patch could fix the potential hang I mentioned in
> 
> https://patchwork.kernel.org/project/linux-block/patch/20200911032958.125068-1-jefflexu@xxxxxxxxxxxxxxxxx/

Right, I remembered that race.


Thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux