On Thu, Oct 15, 2020 at 03:40:31PM +0800, Jeffle Xu wrote: > Both blkdev fs and iomap-based fs (ext4, xfs, etc.) currently support > sync iopoll. One single bio can contain at most BIO_MAX_PAGES, i.e. 256 > bio_vec. If the input iov_iter contains more than 256 segments, then > the IO request described by this iov_iter will be split into multiple > bios, which may cause potential deadlock for sync iopoll. > > When it comes to sync iopoll, the bio is submitted without REQ_NOWAIT > flag set and the process may hang in blk_mq_get_tag() if the input > iov_iter has to be split into multiple bios and thus rapidly exhausts > the queue depth. The process has to wait for the completion of the > previously allocated requests, which should be done by the following > sync polling, and thus causing a deadlock. > > Actually there's subtle difference between the behaviour of handling > HIPRI IO of blkdev and iomap, when the input iov_iter need to split > into multiple bios. blkdev will set REQ_HIPRI for only the last split > bio, leaving the previous bio queued into normal hardware queues, which > will not cause the trouble described above though. iomap will set > REQ_HIPRI for all bios split from one iov_iter, and thus may cause the > potential deadlock decribed above. > > Disable iopoll when one request need to be split into multiple bios. > Though blkdev may not suffer the problem, still it may not make much > sense to iopoll for big IO, since iopoll is initially for small size, > latency sensitive IO. > > Signed-off-by: Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> > --- > fs/block_dev.c | 7 +++++++ > fs/iomap/direct-io.c | 9 ++++++++- > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 9e84b1928b94..a8a52cab15ab 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -491,6 +491,13 @@ blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) > if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES) > return __blkdev_direct_IO_simple(iocb, iter, nr_pages); > > + /* > + * IOpoll is initially for small size, latency sensitive IO. > + * Disable iopoll if split needed. > + */ > + if (nr_pages > BIO_MAX_PAGES) > + iocb->ki_flags &= ~IOCB_HIPRI; more pages than BIO_MAX_PAGES don't imply a split because we can physically merge pages into a single vector (yes, BIO_MAX_PAGES is utterly misnamed now).