On Fri, 2018-07-20 at 10:41 +0800, Ming Lei wrote: > On Thu, Jul 19, 2018 at 11:01:58PM +0200, Martin Wilck wrote: > > When bio_iov_iter_get_pages() is called > > from __blkdev_direct_IO_simple(), > > we already know that the content of the input iov_iter fits into a > > single > > bio, so we expect iov_iter_count(iter) to drop to 0. But in a > > single invocation, > > bio_iov_iter_get_pages() may add less bytes then we expect. For > > iov_iters with > > multiple segments (generated e.g. by writev()), it behaves like an > > iterator's > > next() function, taking only one step (segment) at a time. > > Furthermore, MM may > > fail or refuse to pin all requested pages. The latter may signify > > an error condition > > (in which case eventually an error code will be returned), the > > former does not. > > > > Call bio_iov_iter_get_pages() repeatedly to avoid short reads or > > writes. > > Otherwise, __generic_file_write_iter() falls back to buffered > > writes, which > > has been observed to cause data corruption in certain workloads. > > > > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for > > simplified bdev direct-io") > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > --- > > fs/block_dev.c | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/fs/block_dev.c b/fs/block_dev.c > > index aba2541..561c34e 100644 > > --- a/fs/block_dev.c > > +++ b/fs/block_dev.c > > @@ -222,6 +222,24 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, > > struct iov_iter *iter, > > ret = bio_iov_iter_get_pages(&bio, iter); > > if (unlikely(ret)) > > goto out; > > + > > + /* > > + * bio_iov_iter_get_pages() may add less bytes than we > > expect: > > + * - for multi-segment iov_iters, as it only adds one > > segment at a time > > + * - if MM refuses or fails to pin all requested pages. In > > this case, > > + * an error is returned eventually if no progress can be > > made. > > + */ > > + while (iov_iter_count(iter) > 0 && bio.bi_vcnt < > > bio.bi_max_vecs) { > > Please use the helper bio_full(). > > > + ret = bio_iov_iter_get_pages(&bio, iter); > > + if (unlikely(ret)) > > + goto out; > > + } > > When ret returns, pages pinned already need to be put. Oops, sorry for missing that. > Also I think the way suggested in the following link may be more > generic: > > https://marc.info/?l=linux-block&m=153199830130209&w=2 > > in which it is retried until no progress is made, and there should > have > performance benefit for other users too, not only fix this partial > dio > issue. As there was no unequivocal support for that proposal, I propose to add a new helper without renaming the current one. Callers could then be migrated one-by-one. I will do so in v4. Thanks, Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg)