Re: [PATCH 2/2] blkdev: __blkdev_direct_IO_simple: make sure to fill up the bio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2018-07-19 at 13:56 +0200, Jan Kara wrote:
> On Thu 19-07-18 19:04:46, Ming Lei wrote:
> > On Thu, Jul 19, 2018 at 11:39:18AM +0200, Martin Wilck wrote:
> > > bio_iov_iter_get_pages() returns only pages for a single non-
> > > empty
> > > segment of the input iov_iter's iovec. This may be much less than
> > > the number
> > > of pages __blkdev_direct_IO_simple() is supposed to process. Call
> > > bio_iov_iter_get_pages() repeatedly until either the requested
> > > number
> > > of bytes is reached, or bio.bi_io_vec is exhausted. If this is
> > > not done,
> > > short writes or reads may occur for direct synchronous IOs with
> > > multiple
> > > iovec slots (such as generated by writev()). In that case,
> > > __generic_file_write_iter() falls back to buffered writes, which
> > > has been observed to cause data corruption in certain workloads.
> > > 
> > > Note: if segments aren't page-aligned in the input iovec, this
> > > patch may
> > > result in multiple adjacent slots of the bi_io_vec array to
> > > reference the same
> > > page (the byte ranges are guaranteed to be disjunct if the
> > > preceding patch is
> > > applied). We haven't seen problems with that in our and the
> > > customer's
> > > tests. It'd be possible to detect this situation and merge
> > > bi_io_vec slots
> > > that refer to the same page, but I prefer to keep it simple for
> > > now.
> > > 
> > > Fixes: 72ecad22d9f1 ("block: support a full bio worth of IO for
> > > simplified bdev direct-io")
> > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx>
> > > ---
> > >  fs/block_dev.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > > index 0dd87aa..41643c4 100644
> > > --- a/fs/block_dev.c
> > > +++ b/fs/block_dev.c
> > > @@ -221,7 +221,12 @@ __blkdev_direct_IO_simple(struct kiocb
> > > *iocb, struct iov_iter *iter,
> > >  
> > >  	ret = bio_iov_iter_get_pages(&bio, iter);
> > >  	if (unlikely(ret))
> > > -		return ret;
> > > +		goto out;
> > > +
> > > +	while (ret == 0 &&
> > > +	       bio.bi_vcnt < bio.bi_max_vecs &&
> > > iov_iter_count(iter) > 0)
> > > +		ret = bio_iov_iter_get_pages(&bio, iter);
> > > +
> > >  	ret = bio.bi_iter.bi_size;
> > >  
> > >  	if (iov_iter_rw(iter) == READ) {
> > > @@ -250,6 +255,7 @@ __blkdev_direct_IO_simple(struct kiocb *iocb,
> > > struct iov_iter *iter,
> > >  		put_page(bvec->bv_page);
> > >  	}
> > >  
> > > +out:
> > >  	if (vecs != inline_vecs)
> > >  		kfree(vecs);
> > > 
> > 
> > You might put the 'vecs' leak fix into another patch, and resue the
> > current code block for that.
> > 
> > Looks all users of bio_iov_iter_get_pages() need this kind of fix,
> > so
> > what do you think about the following way?
> 
> No. AFAICT all the other users of bio_iov_iter_get_pages() are
> perfectly
> fine with it returning less pages and they loop appropriately.

We might consider to better fill the bios in __blkdev_direct_IO(), too,
it might be a performance gain.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux