Re: [PATCHv2 3/3] block: relax direct io memory alignment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 18, 2022 at 08:25:26PM -0600, Keith Busch wrote:
> On Wed, May 18, 2022 at 07:08:11PM -0700, Eric Biggers wrote:
> > On Wed, May 18, 2022 at 07:59:36PM -0600, Keith Busch wrote:
> > > I'm aware that spanning pages can cause bad splits on the bi_max_vecs
> > > condition, but I believe it's well handled here. Unless I'm terribly confused,
> > > which is certainly possible, I think you may have missed this part of the
> > > patch:
> > > 
> > > @@ -1223,6 +1224,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > >  	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
> > > 
> > >  	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
> > > +	if (size > 0)
> > > +		size = ALIGN_DOWN(size, queue_logical_block_size(q));
> > >  	if (unlikely(size <= 0))
> > >  		return size ? size : -EFAULT;
> > > 
> > 
> > That makes the total length of each "batch" of pages be a multiple of the
> > logical block size, but individual logical blocks within that batch can still be
> > divided into multiple bvecs in the loop just below it:
> 
> I understand that, but the existing code conservatively assumes all pages are
> physically discontiguous and wouldn't have requested more pages if it didn't
> have enough bvecs for each of them:
> 
> 	unsigned short nr_pages = bio->bi_max_vecs - bio->bi_vcnt;
> 
> So with the segment alignment guarantee, and ensured available bvec space, the
> created bio will always be a logical block size multiple.
> 
> If we need to split it later due to some other constraint, we'll only split on
> a logical block size, even if its in the middle of a bvec.
> 

So the bio ends up with a total length that is a multiple of the logical block
size, but the lengths of the individual bvecs in the bio are *not* necessarily
multiples of the logical block size.  That's the problem.

Note, there's also lots of code that assumes that bio_vec::bv_len is a multiple
of 512.  That was implied by it being a multiple of the logical block size.  But
the DMA alignment can be much lower, like 8 bytes (see nvme_set_queue_limits()).

- Eric



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux