On 11/20/20 1:55 AM, Christoph Hellwig wrote:
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 933f234d5bec..396ac0f91a43 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -309,6 +309,16 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
copied += n;
nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES);
+ /*
+ * The current dio needs to be split into multiple bios here.
+ * iopoll for split bio will cause subtle trouble such as
+ * hang when doing sync polling, while iopoll is initially
+ * for small size, latency sensitive IO. Thus disable iopoll
+ * if split needed.
+ */
+ if (nr_pages)
+ dio->iocb->ki_flags &= ~IOCB_HIPRI;
I think this is confusing two things.
Indeed there's two level of split concerning this issue when doing sync
iopoll.
The first is that one bio got split in block-core, and patch 1 of this
patch set just fixes this.
Second is that one dio got split into multiple bios in fs layer, and
patch 2 fixes this.
One is that we don't handle
polling well when there are multiple bios. For this I think we should
only call bio_set_polled when we know there is a single bio.
How about the following patch:
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -60,12 +60,12 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin)
EXPORT_SYMBOL_GPL(iomap_dio_iopoll);
static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap
*iomap,
- struct bio *bio, loff_t pos)
+ struct bio *bio, loff_t pos, bool split)
{
atomic_inc(&dio->ref);
if (dio->iocb->ki_flags & IOCB_HIPRI)
- bio_set_polled(bio, dio->iocb);
+ bio_set_polled(bio, dio->iocb, split);
dio->submit.last_queue = bdev_get_queue(iomap->bdev);
if (dio->dops && dio->dops->submit_io)
@@ -214,6 +214,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos,
loff_t length,
int nr_pages, ret = 0;
size_t copied = 0;
size_t orig_count;
+ bool split = false;
if ((pos | length | align) & ((1 << blkbits) - 1))
return -EINVAL;
@@ -309,7 +310,17 @@ iomap_dio_bio_actor(struct inode *inode, loff_t
pos, loff_t length,
copied += n;
nr_pages = iov_iter_npages(dio->submit.iter,
BIO_MAX_PAGES);
- iomap_dio_submit_bio(dio, iomap, bio, pos);
+ /*
+ * The current dio needs to be split into multiple bios
here.
+ * iopoll for split bio will cause subtle trouble such as
+ * hang when doing sync polling, while iopoll is initially
+ * for small size, latency sensitive IO. Thus disable iopoll
+ * if split needed.
+ */
+ if (nr_pages)
+ split = true;
+
+ iomap_dio_submit_bio(dio, iomap, bio, pos, split);
pos += n;
} while (nr_pages);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..21f772f98878 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -806,9 +806,11 @@ static inline int bio_integrity_add_page(struct bio
*bio, struct page *page,
* must be found by the caller. This is different than IRQ driven IO,
where
* it's safe to wait for IO to complete.
*/
-static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb)
+static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb,
bool split)
{
- bio->bi_opf |= REQ_HIPRI;
+ if (!split)
+ bio->bi_opf |= REQ_HIPRI;
+
if (!is_sync_kiocb(kiocb))
bio->bi_opf |= REQ_NOWAIT;
}
After this patch, bio will be polled only one dio maps to one single bio.
Noted that this change applies to both sync and async IO, though async
routine doesn't
suffer the hang described in this patch set. Since the performance gain
of iopoll may be
trivial when one dio got split, also disable iopoll for async routine.
We need keep REQ_NOWAIT for async routine even when the dio split happened,
because io_uring doesn't expect blocking. Though the original REQ_NOWAIT
should gets from iocb->ki_flags & IOCB_NOWAIT. Currently iomap doesn't
inherit
bio->bi_opf's REQ_NOWAIT from iocb->ki_flags's IOCB_NOWAI. This bug should
be fixed by
https://lore.kernel.org/linux-fsdevel/1605685931-207023-1-git-send-email-haoxu@xxxxxxxxxxxxxxxxx/T/#t
Maybe we could include this fix (of missing inheritance of IOCB_NOWAI)
into this patch
set and then refactor the fix I mentioned in this patch?
--
Thanks,
Jeffle