On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote: > bio_iov_iter_get_pages() returns only pages for a single non-empty > segment of the input iov_iter's iovec. This may be much less than the number > of pages __blkdev_direct_IO_simple() is supposed to process. Call > bio_iov_iter_get_pages() repeatedly until either the requested number > of bytes is reached, or bio.bi_io_vec is exhausted. If this is not done, > short writes or reads may occur for direct synchronous IOs with multiple > iovec slots (such as generated by writev()). In that case, > __generic_file_write_iter() falls back to buffered writes, which > has been observed to cause data corruption in certain workloads. > > Note: if segments aren't page-aligned in the input iovec, this patch may > result in multiple adjacent slots of the bi_io_vec array to reference the same > page (the byte ranges are guaranteed to be disjunct if the preceding patch is > applied). We haven't seen problems with that in our and the customer's > tests. It'd be possible to detect this situation and merge bi_io_vec slots > that refer to the same page, but I prefer to keep it simple for now. > > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for simplified bdev direct-io") > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > --- > fs/block_dev.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 0dd87aa..41643c4 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > > ret = bio_iov_iter_get_pages(&bio, iter); > if (unlikely(ret)) > - return ret; > + goto out; > + > + while (ret == 0 && > + bio.bi_vcnt < bio.bi_max_vecs && iov_iter_count(iter) > 0) > + ret = bio_iov_iter_get_pages(&bio, iter); > + > ret = bio.bi_iter.bi_size; > > if (iov_iter_rw(iter) == READ) { > @@ -250,6 +255,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, > put_page(bvec->bv_page); > } > > +out: > if (vecs != inline_vecs) > kfree(vecs); > You might put the 'vecs' leak fix into another patch, and resue the current code block for that. Looks all users of bio_iov_iter_get_pages() need this kind of fix, so what do you think about the following way? diff --git a/block/bio.c b/block/bio.c index f3536bfc8298..23dd4c163dfc 100644 --- a/block/bio.c +++ b/block/bio.c @@ -904,15 +904,7 @@ int bio_add_page(struct bio *bio, struct page *page, } EXPORT_SYMBOL(bio_add_page); -/** - * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio - * @bio: bio to add pages to - * @iter: iov iterator describing the region to be mapped - * - * Pins as many pages from *iter and appends them to @bio's bvec array. The - * pages will have to be released using put_page() when done. - */ -int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) +static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) { unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt; struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt; @@ -951,6 +943,28 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) iov_iter_advance(iter, size); return 0; } + +/** + * bio_iov_iter_get_pages - pin user or kernel pages and add them to a bio + * @bio: bio to add pages to + * @iter: iov iterator describing the region to be mapped + * + * Pins as many pages from *iter and appends them to @bio's bvec array. The + * pages will have to be released using put_page() when done. + */ +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter) +{ + int ret; + unsigned int size; + + do { + size = bio->bi_iter.bi_size; + ret = __bio_iov_iter_get_pages(bio, iter); + } while (!bio_full(bio) && iov_iter_count(iter) > 0 && + bio->bi_iter.bi_size > size); + + return bio->bi_iter.bi_size > 0 ? 0 : ret; +} EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages); static void submit_bio_wait_endio(struct bio *bio) Thanks, Ming